Hypothesis Testing

Introduction

Be sure to read through the definitions for this section before trying to make sense out of the following.

The first thing to do when given a claim is to write the claim mathematically (if possible), and decide whether the given claim is the null or alternative hypothesis. If the given claim contains equality, or a statement of no change from the given or accepted condition, then it is the null hypothesis, otherwise, if it represents change, it is the alternative hypothesis.

The following example is not a mathematical example, but may help introduce the concept.

Example

"He's dead, Jim," said Dr. McCoy to Captain Kirk.

Mr. Spock, as the science officer, is put in charge of statistically determining the correctness of Bones' statement and deciding the fate of the crew member (to vaporize or try to revive)

His first step is to arrive at the hypothesis to be tested.

Does the statement represent a change in previous condition?

Yes, there is change, thus it is the alternative hypothesis, H₁
No, there is no change, therefore is the null hypothesis, H₀

The correct answer is that there is change. Dead represents a change from the accepted state of alive. The null hypothesis always represents no change. Therefore, the hypotheses are:

H₀ : Patient is alive.
H₁ : Patient is not alive (dead).

States of nature are something that you, as a statistician have no control over. Either it is, or it isn't. This represents the true nature of things.

Possible states of nature (Based on H₀)

Patient is alive (H₀ true - H₁ false )
Patient is dead (H₀ false - H₁ true)

Decisions are something that you have control over. You may make a correct decision or an incorrect decision. It depends on the state of nature as to whether your decision is correct or in error.

Possible decisions (Based on H₀ ) / conclusions (Based on claim )

Reject H₀ / "Sufficient evidence to say patient is dead"
Fail to Reject H₀ / "Insufficient evidence to say patient is dead"

There are four possibilities that can occur based on the two possible states of nature and the two decisions which we can make.

Statisticians will never accept the null hypothesis, we will fail to reject. In other words, we'll say that it isn't, or that we don't have enough evidence to say that it isn't, but we'll never say that it is, because someone else might come along with another sample which shows that it isn't and we don't want to be wrong.

Statistically (double) speaking ...

	State of Nature
Decision	H₀ True	H₀ False
Reject H₀	Patient is alive, Sufficient evidence of death	Patient is dead, Sufficient evidence of death
Fail to reject H₀	Patient is alive, Insufficient evidence of death	Patient is dead, Insufficient evidence of death

In English ...

	State of Nature
Decision	H₀ True	H₀ False
Reject H₀	Vaporize a live person	Vaporize a dead person
Fail to reject H₀	Try to revive a live person	Try to revive a dead person

Were you right ? ...

	State of Nature
Decision	H₀ True	H₀ False
Reject H₀	Type I Error alpha	Correct Assessment
Fail to reject H₀	Correct Assessment	Type II Error beta

Which of the two errors is more serious? Type I or Type II ?

Since Type I is the more serious error (usually), that is the one we concentrate on. We usually pick alpha to be very small (0.05, 0.01). Note: alpha is not a Type I error. Alpha is the probability of committing a Type I error. Likewise beta is the probability of committing a Type II error.

Conclusions

Conclusions are sentence answers which include whether there is enough evidence or not (based on the decision), the level of significance, and whether the original claim is supported or rejected.

Conclusions are based on the original claim, which may be the null or alternative hypotheses. The decisions are always based on the null hypothesis

Original Claim

Decision H₀
"REJECT" H₁
"SUPPORT"

Reject H₀
"SUFFICIENT" There is sufficient evidence at the (alpha) level of significance to reject the claim that (restate original claim) There is sufficient evidence at the (alpha) level of significance to support the claim that (restate original claim)

Fail to reject H₀
"INSUFFICIENT" There is insufficient evidence at the (alpha) level of significance to reject the claim that (restate original claim) There is insufficient evidence at the (alpha) level of significance to support the claim that (restate original claim)

The process

There is one fundamental guideline when performing hypothesis testing.

All hypothesis testing is done under the assumption
that the null hypothesis is true.

Here's how that works

We make an assumption
We then test that assumption by gathering data and looking at the results
If the results are too unusual to have happened just by chance, then we reject our assumption

Examples

The math used to find these probabilities is explained in a later section. Just try to understand the process and the logic for now.

Example 1: Is the coin fair?

We make an assumption that the coin is fair, that is, that the probability of flipping a head on a single trial will be 1/2.
We gather data and test that claim. Let's say that we flipped the coin 100 times and we came up with 55 heads. The probability of getting 55 heads in 100 tosses, if the probability of heads is really 1/2 is 0.3173.
Remember that there is always sampling error. We should not expect to get exactly 50 heads every time that flip 100 coins, there will be some variation because the results are random. The question is whether there's too much variation to just be caused by random chance. What we have found is that we could get these results 31.73% of the time just by chance. Something that happens 31.73% of the time is not unusual at all. Normally, we reserve the "unusual" status for things that happen less than 5% of the time. So, we fail to reject our assumption that the probability of getting a head is 1/2 and don't have enough evidence to reject the claim that the coin is fair.

Now, consider the same problem, except this time, 65 heads were flipped in 100 trials. We find that the probability of that happening, if the coin really is fair, is 0.0027. That's less than 1/2 of one percent of the time. That is unusual. That is so unusual that it is very unlikely to happen by chance alone; therefore, our assumption must be wrong, and we reject our claim that the probability of heads is 1/2, and there is sufficient evidence to reject the claim that the coin is fair.

If the results are too unusual to happen just by chance,
then the assumption (that the null hypothesis is true)
must be wrong and we reject it.

Example 2: Does pi equal 3.2?

Everybody knows that pi is 3.14, right? Well, Indiana once tried to say it was 3.2 by legislative action. Here is a little bit lengthier example to test their claim.

Table of contents
James Jones

	Original Claim
Decision	H₀ "REJECT"	H₁ "SUPPORT"
Reject H₀ "SUFFICIENT"	There is sufficient evidence at the (alpha) level of significance to reject the claim that (restate original claim)	There is sufficient evidence at the (alpha) level of significance to support the claim that (restate original claim)
Fail to reject H₀ "INSUFFICIENT"	There is insufficient evidence at the (alpha) level of significance to reject the claim that (restate original claim)	There is insufficient evidence at the (alpha) level of significance to support the claim that (restate original claim)