Independent variables can be combined to form new variables. The mean and variance of the combination can be found from the means and the variances of the original variables.
Combination of Variables | In English (Melodic Mathematics) |
![]() |
The mean of a sum is the sum of the means. |
![]() |
The mean of a difference is the difference of the means. |
![]() |
The variance of a sum is the sum of the variances. |
![]() |
The variance of a difference is the sum of the variances. |
Since we are combining two variables by subtraction, the important rules from the table above are that the mean of the difference is the difference of the means and the variance of the difference is the sum of the variances.
It is important to note that the variance of the difference is the sum of the variances, not the standard deviation of the difference is the sum of the standard deviations. When we go to find the standard error, we must combine variances to do so. Also, you're probably wondering why the variance of the difference is the sum of the variances instead of the difference of the variances. Since the values are squared, the negative associated with the second variable becomes positive, and it becomes the sum of the variances. Also, variances can't be negative, and if you took the difference of the variances, it could be negative.
When the population variances are known, the difference of
the means has a normal distribution. The variance of the
difference is the sum of the variances divided by the sample
sizes. This makes sense, hopefully, because according to the
central limit theorem, the variance of the sampling
distribution of the sample means is the variance divided by
the sample size, so what we are doing is add the variance of
each mean together. The test statistic is shown.
When the population variances aren't known, the
difference of the means has a Student's t distribution.
However, if both sample sizes are large enough, then
you will be using the normal row from the t-table, so
your book lumps this under the normal distribution,
rather than the t-distribution. This gives us the chance to work the problem without knowing if
the population variances are equal or not. The test statistic is shown, and is identical to above,
except the sample variances are used instead of the population variances.
Ok, you're probably wondering how do you know if the variances are equal or not if you don't know what they are. Some books teach the F-test to test the equality of two variances, and if your book does that, then you should use the F-test to see. Other books (statisticians) argue that if you do the F-test first to see if the variances are equal, and then use the same level of significance to perform the t-test to test the difference of the means, that the overall level of significance isn't the same. So, the Bluman text tells the student whether or not the variances are equal and the Triola text makes the students test it for themselves.
Since you don't know the population variances, you're going
to be using a Student's t distribution. Since the variances are
unequal, there is no attempt made to average them together as
we will in the next situation. The degrees of freedom is the
smaller of the two degrees of freedom (n-1 for each). The
"min" function means take the minimum or smaller of the
two values. Otherwise, the formula is the same as we used
with large sample sizes.
If the variances are equal, then an effort is made to average them
together. Now, equal does not mean identical. It is possible for two
variances to be statistically equal but be numerically different. We
will find a pooled estimate of the variance which is simply the
weighted mean of the variance. The weighting factors are the degrees
of freedom.
Once the pooled estimate of the variance is computed, this
mean (average) variance is used in the place of the individual
sample variances. Otherwise, the formula is the same as
before. The degrees of freedom are the sum of the individual
degrees of freedom.