Minitab Instructions for Hypothesis Testing Technology Exercises

This document is set up a little bit differently than other set of instructions. There are common tasks needed for each of the questions, so I'm going to explain how to do each task in its own section rather than within the context of a particular problem.

Randomly Selecting Dates

There are a couple of problems that ask you to randomly select 16 dates for a certain time period. This example will assume you want to pick from dates from January 1, 2005 through December 31, 2005. Adjust the dates accordingly.

Generating the Dates to Pick From

Label a column as dates
Choose Calc / Make Patterned Data / Simple Set of Date/Time Values
1. Store the patterned data in dates
2. The first date/time is 01/01/2005
3. The last date/time is 12/31/2005
4. The steps are 1 day each.
5. You want each value listed once and the entire sequence listed once

Randomly Selecting the Dates

Label a column as picked. You may want to use something more appropriate, especially if you have more than one sample you're going to be taken (as in the case of the temperature question).
Choose Calc / Random data / Sample from columns
1. You want to sample 16 rows (use the number of dates you need)
2. From the dates column
3. Storing the sample in the picked column
4. Do not sample with replacement

Although the actual dates aren't necessary for the analysis, you may want to leave them in the worksheet and then put the data value for that date in the column next to it, matching dates with values. You will need to put the dates on the assignment you turn in.

Critical Values

To look up a critical value, you need to know three things.

The distribution being used. This is the standard normal distribution (z) for proportions or the Student's t distribution (t) for means. If you use a t distribution, you will also need to know the degrees of freedom.
The significance level α. The default value used if no other value is given is α = 0.05.
Whether it is a one tail (left tail or right tail) or a two tail test. This comes from the alternative hypothesis, H₁.

Using the t-table

At the top of the t-table, there is a row for one tail areas and a row for two tail areas. Pick the appropriate row based on the type of test.
Scan across that row and find your significance level α.
Use that column and go down until you find the proper degrees of freedom. If you're using the normal distribution, go all the way to the bottom row.
Read your critical value(s) from the table. There should always be as many critical values as tails and the signs should agree. That is, a left tail test should have a negative critical value, a right tail test should have a positive critical value, and a two tail test should have both a positive and negative critical value.

Using Minitab

Minitab requires the area to the left of the critical value.

Use the appropriate distribution
1. Choose Calc / Probability Distributions / Normal if you are using the standard normal distribution
2. Choose Calc / Probability Distributions / t if you are using the Student's t distribution
Choose Inverse cumulative probability
Enter the degrees of freedom if you have a Student's t distribution
Choose Input constant and enter the area to the left of the critical value for the constant
1. If it is a left tail test, then the area to the left is the significance level α
2. If it is a right tail test, then the area to the left is 1 - α
3. If it is a two tail test, then the area to the left of the left critical value is α / 2. Once you find this critical value, the right critical value is its opposite.

Test Statistics, P-Values, and Confidence Intervals

This is the crux of the hypothesis test.

Choosing the Right Test

You will need to go to Stats / Basic Statistics and choose the proper kind of test. Once you have the proper test chosen, the process is very similar for each of the tests.

Test about population mean(s)

1 Sample Z: Use this when you're testing a claim about a single population mean μ and you know the population standard deviation σ. This is almost never the case.
1 Sample T: Use this when you're testing a claim about a single population mean μ and you do not know the population standard deviation σ. This is almost always the case when you're testing a single mean.
2 Sample T: Use this when you're testing a claim about two independent means μ₁ and μ₂. This is when you have two different samples that you want to compare.
Paired T: Use this when you're using two dependent samples. This is better described as one sample with two different measurements for each object in the sample.

Test about population proportion(s)

1 Proportion: Use this when you're testing a claim about a single population proportion p.
2 Proportions: Use this when you're testing a claim about two population proportions p₁ and p₂.

When you are testing a proportion, it will ask for the number of trials (n) and the number of events (x). The number of events is what we called the number of successes, but it is important to note that these are both whole numbers, not percents or proportions. If you have a sample size of 425 with a 37% success rate, then the number of successes would be 0.37 * 425 = 157.25, but since it has to be a whole number, you would round that to be 157 and enter that for the number of successes.

When working with proportions in this class, there are check boxes under Options that should be selected.

One Sample vs Two

A little note about one sample versus two samples. When you are working with one sample, you need a numerical value that you can test against. That is, you will have to have a claimed value for the population mean or the population proportion. Something like μ = 15 or p = 0.34.

When you are working with two samples, you don't need a specific value, you are comparing them to each other. You'll take a claim like μ₁ = μ₂, but there is no claim that they're both equal to another value like μ₁ = μ₂ = 15. When you actually get around to performing the hypothesis test, you'll move both variables to the same side so the right side is 0 and get μ₁ - μ₂ = 0. This test is sometimes called "the difference of the means." The confidence interval generated will be for the difference of the means, not for each mean, so they should be centered around 0.

Options

Each hypothesis test has an Options button. This is where you go to set specific information about the test.

In particular, you set these options

confidence level (as a percent) but based on your significance level. A significance level of 0.05 would be a confidence level of 95%, which is the default value. A significance level of 0.01 would be a confidence level of 99%.
test value: This is the value from the null hypothesis that the mean or proportion is assumed to be. This will say test mean, test difference, test proportion, etc., based on what you're testing.
alternative: This is where you tell Minitab whether it is a left tail test, right tail test, or two tail test by choosing the inequality from the alternative hypothesis

Interpreting the output

Each output screen will contain three main pieces of information for each hypothesis test.

Sample output from a One Sample T Test

This problem started off with H₀: μ = 38, H₁: μ ≠ 38; α=0.05, n = 15, x=36, s=4.

One-Sample T 

Test of mu = 38 vs not = 38


 N     Mean   StDev  SE Mean        95% CI            T      P
15  36.0000  4.0000   1.0328  (33.7849, 38.2151)  -1.94  0.073

Rest your mouse on any abbreviation to see what the abbreviation stands for.

The three pieces of information on the right are the ones that we're concerned with here.

Confidence Interval

The way the confidence interval is described depends on the alternative hypothesis and the type of test. Although the examples below are about a single population mean μ, you should use p if you're doing a test about a proportion. If you're doing a test about two means, then it would be μ₁ - μ₂ in the middle of the confidence interval.

Two tailed test: The label will be "95% CI" and the interval will be given in interval notation like above, (33.7849, 38.2151). When you write this on your paper, you should write it as 33.7849 < μ < 38.2151.
Left tailed test: The label will be "95% Upper Bound" and a single number, like 37.8191 will be given. An upper bound means that is the highest the values can be, but it could be less. You would write your interval as μ < 37.8191.
Right tailed test: The label will be "95% Lower Bound" and a single number, like 34.1809 will be given. A lower bound means that is the smallest the values can be, but it could be more. You would write your interval as μ > 34.1809.

In all cases, notice that the direction of the confidence interval agrees with the alternative hypotheses.

Test Statistic

This will be labeled with the name of the distribution. In some cases it might have the word "-value" attached to the end of it like "z-value".

There is only one test statistic for any hypothesis test, no matter whether it is a left tail test, right tail test, or two tail test. In fact, the type of test has absolutely no bearing on whether your test statistic is positive or negative. However, if you have a left tail test with a positive test statistic or a right tail test with a negative test statistic, you might want to double check that you chose the correct alternative hypothesis. You might be better off doing a two tail test.

Probability Value (p-value)

This will be labeled as P or sometimes P-Value. It is usually the last value in the output.

The p-value is the area beyond the test statistic where "beyond" depends on the type of test.

For a left tail test, the p-value is the area to the left of the test statistic.
For a right tail test, the p-value is the area to the right of the test statistic.
For a two tail test, the p-value is twice the smaller of the areas to the left or right of the test statistic.

Specific Tests

Here are some notes about each of the tests. The information here is how that particular test is different from the generalized instructions above.

One Sample T Test

Use this when you are testing a claim about the mean of a single population.

You have your choice of using data from a worksheet (using Samples in columns) or the summarized data. Be sure to check the box that says Perform hypothesis test if you want to do a hypothesis test. The hypothesized mean is the claimed value from the null hypothesis.

Two Sample T Test

Use this when you are claiming a test about two independent population means. What you will actually be testing is the difference of the means.

You have three choices for the type of data

Samples in one column is when you have one column that contains the actual data values (samples) and another column that contains a categorical variable identifying the group (subscripts). The subscript column can only contain two unique values and the results will appear in alphabetical order unless you change it. That means sample 1 will be first alphabetically and sample 2 will be second alphabetically in case you need to use a one tail test.
Samples in differenct columns is when you have two columns that contain the actual data values. Your columns will probably be labeled with the name of the group, which is what you would have used in the categorical variable had you chosen samples in one column.
Summarized data is when you don't have the actual data, but you have summary statistics.

You should not need to check the "assume equal variances box". This is based on an F-test for equality of the variances and for the most part, it's safe to assume that the variances are not equal.

Paired T Test

Use this when you are testing a claim about two dependent or paired samples. This is actually the same sample with two measurements or values for each case. The process is to form a new variable by subtracting the second sample from the first and then performing a One Sample T Test on that difference. This is sometimes called the mean of the difference and is symbolized as μ_d.

There are two choices here.

Samples in columns is what you use when you have the original data, one set in each column.
Summarized data (differences) is what you use if you have already subtracted the two variables to form the difference and you have the summarized data. Note that this is identical to running a one sample t test with that new variable, the differences.

One Proportion

Use then when you are testing a claim about a single population proportion.

There are two ways to specify the information.

Samples in columns is what you use when you have a categorical variable with exactly two choices. The first choice (alphabetically unless a different order is specified) is used as failure and the second choice is used as success. For example, if the two choices are "yes" and "no", then "yes" is a success. But if your two choices are "male" and "female", then "female" is a success and you may need to adjust your claim. The other way to fix things is to left click in the column, right click the mouse and choose Column / Value order and then specify the order. The last one specified will be considered success.
Summarized data is when you already have the sample size and number of successes (events).

Be sure to check the "use test and interval based on normal distribution" under options.

Two Proportions

Use then when you are testing a claim about two population proportions, like p₁ = p₂. What you will really be testing is the difference in the proportions, p₁ - p₂ = 0.

There are three ways to specify your data.

Samples in one column is when you have one categorical column containing the success / failure (yes/no) values (samples) and another categorical column containing labels for the groups (subscripts). The grouping variable can only have two values and sample 1 will be the one that comes first alphabetically unless you change the order.
Samples in different columns is when you have two categorical columns that contain the success / failure (yes/no) values. The columns should be labeled with the names of the groups.
Summarized data is when you have the counts for each sample.

Be sure to check the "use pooled estimate of p for test" under options.