Technology Exercise 3: Inferential Statistics

First Amendment (Question 1)

Generating 200 samples of size 1002 (Question 1a)

Don't bother to label your columns, we'll refer to them as C1, C2, C3, etc. You can specify a range of columns as C1-C10 (to refer to the ten columns C1, C2, C3, ..., C10).

This problem satisfies the conditions of a binomial experiment. There are a fixed number of independent trials each having exactly two possible outcomes. The number of trials is 1002, one person's answer doesn't affect another's, and there are two possible answers (opposed or not opposed). Note that not opposed isn't the same as in favor of, it might include those who are undecided or don't care, but we need two categories, success or failure, so success is someone who is opposed and failure is everyone else.

Since this is a binomial problem, each of the 1002 persons is a Bernoulli trial with probability of 0.58 of success.

  1. Choose Calc / Random Data / Bernoulli
    1. Generate 1002 rows of data
    2. Store them in C1-C200 (columns 1-200). This allows you to do all 200 trials at once. Pretty nifty, huh?
    3. The probability of success is 0.58.
    4. Click OK

Confidence Intervals (Question 1b)

  1. Choose Stats / Basic Statistics / 1 Proportion
    1. The samples are in columns C1-C200
    2. Click on Options
      1. Check the box that says "Use test and interval based on normal distribution"
      2. Optionally (we won't use it here, but my example uses it) change the test proportion to be 0.58
      3. Click OK
    3. Click OK
  2. Look at each of the 200 lines of output. Each one will contain a confidence interval inside parentheses like (0.557347, 0.618302)
  3. Count the number of times that 0.58 does not fall in the confidence interval. This happens when the lower number is more than 0.580000 or the higher number is less than 0.580000.
  4. Subtract the number of times that the interval does not contain the 0.58 from 200 to find the number of times the value fell in the confidence interval
  5. Find the percent of times the value of 0.58 fell in the confidence interval.

Sample Output

Do NOT copy all of the output into your Word document. Make a chart like what appears below. That is, copy the header and then each case where 0.58 did NOT fall in the interval. Then go ahead and answer the question about the percent of the time time that 0.58 did fall in the interval.

Please note that your Z-Values and P-Values are not used here, but will look drastically different if you don't change the test proportion to 0.58 as indicated in the optional step above.

Variable    X     N  Sample p         95% CI         Z-Value  P-Value
C2 619 1002 0.617764 (0.587677, 0.647852) 2.42 0.015 C22 547 1002 0.545908 (0.515080, 0.576736) -2.19 0.029 C32 613 1002 0.611776 (0.581601, 0.641952) 2.04 0.042 C68 619 1002 0.617764 (0.587677, 0.647852) 2.42 0.015 C105 624 1002 0.622754 (0.592743, 0.652766) 2.74 0.006 C156 629 1002 0.627745 (0.597813, 0.657676) 3.06 0.002

In my sample, there were 6 times that the interval did not contain the 0.58. That means that there were 194 times that did or 97% of the confidence intervals contain the true population proportion. That's pretty close to the 95% of the time implied by the confidence level.

Finding the Means (Question 1c)

  1. Choose Stats / Basic Statistics / Store Descriptive Statistics
    1. The variables are C1-C200
    2. Click on Statistics and check Sum. Uncheck everything else.
    3. Click OK. What this command did was to add up the number of 1's in each column and create 200 new columns called sum1, sum2, ..., sum200 that each contain the number of successes for each of the 200 samples.
  2. Unfortunately, those 200 samples need to be in one column so we can work with them instead of in 200 columns. This step will fix that. Choose Data / Stack / Columns.
    1. Stack the columns sum1-sum200 (or you could use C201-C400)
    2. Store the stacked data into a new worksheet. You can leave the name of the worksheet blank or give it a name like "amendment" to make it easier to find.
    3. Make sure the Use variable names in subscript column box is not checked.
    4. Click OK.
  3. The new worksheet contains two columns. One called Subscripts that contains the numbers 1 through 200 and C2, it isn't labeled, and contains the number of successes for each of the 200 trials. Label C2 as "x". In a binomial experiment, x represents the number of successes.
  4. Label a blank column as "p" for p-hat. When dealing with proportions, p hat (a p with a carat symbol over it) is the sample proportion.
  5. Choose Calc / Calculator
    1. Store the results into p
    2. The expression is "x /1002". The 1002 is our sample size.
    3. Click OK
  6. Choose Stat / Basic Statistics / Graphical Summary
    1. The variable is p
    2. Click OK

When you copy this into Word, be sure to drag the box a little bit bigger so that you can read all the numbers.

Be sure to close the worksheet that contains the data in C1-C200 and Sum1-Sum200. If you don't, the file will be about 75 times bigger than it needs to be and can easily fill up all of the allocated space on the network drive. When this happened, students were unable to save their work and lost everything they had done. So be sure to close that first worksheet before saving your file.

Central Limit Theorem (Question 2)

Create a Discrete Probability Distribution (Question 2a)

A probability distribution is a list of all the values that a random variable x can assume along with their associated probabilities.

Here is an example of a probability distribution.

x -5 1 3 10
p(x) 0.24 0.15 0.36 0.25

Your probability distribution has to have at least 5 values for x. Remember that the probabilities have to add up to 1.

Enter your probability distribution into a table in Word. Add one more column for the total and two more rows to the table for xp(x) and x2p(x) and complete them. Your table should look similar to this

x -5 1 3 10 Total
p(x) 0.24 0.15 0.36 0.25 1.00
xp(x) -1.2 0.15 1.08 2.5 2.53
x2p(x) 6 0.15 3.24 25 34.39

You will need to use the values from the total column to answer part b. That's not done with Minitab.

Creating the Worksheet

  1. Label the first column as x and the second column as p
  2. Enter the values for x into the first column
  3. Enter the corresponding probabilities into the second column
  4. Label the third, fourth, and fifth columns as mean4, mean25, and mean100, but don't put anything in them yet.

Simulating Samples (Question 2c)

  1. Choose Calc / Random Data / Discrete
    1. Generate 1000 rows of data
    2. Where you store your data depends on your sample size.
      1. When n=4, store the data in C101-C104
      2. When n=25, store the data in C101-C125
      3. When n=100, store the data in C101-C200
    3. The values are in the x column
    4. The probabilities are in the p column
    5. Click OK
  2. Choose Calc / Row Statistics
    1. Select the Mean as the statistic to calculate
    2. The Input variables depends on your sample size
      1. When n=4, use C101-C104
      2. When n=25, use C101-C125
      3. When n=100, use C101-C200
    3. Where you store the result depends on your sample size
      1. When n=4, use mean4
      2. When n=25, use mean25
      3. When n=100, use mean100
    4. Click OK
  3. Repeat steps 1 and 2 with n=25 and n=100
  4. Choose Stats / Basic Statistics / Graphical Summary
    1. The variables are mean4, mean25, and mean100
    2. Click OK

Creating the Graph for a Hypothesis Test (Question 3)

Two tail hypothesis test You are asked to create the graph to the right.

  1. Label two columns of the worksheet as z and p
  2. Choose Calc / Make Patterned Data / Simple Set of Numbers
    1. Store the patterned data in z
    2. Start at -3.5
    3. Go to 3.5
    4. Use a step size of 0.1
  3. Choose Calc / Probability Distributions / Normal
    1. Choose Probability Density
    2. The input column is z
    3. The storage is p
  4. Choose Graph / Scatterplot / With Connect Line
    1. The y variable is p and the x variable is z
    2. Choose Scale
      1. Under Axes and Ticks, uncheck all boxes for the high y-scale and the high x-scale. Check all boxes for the low x-scale. The only box that should be checked under the low y-scale is the axis line.
      2. Under Reference Lines, show reference lines for X positions of -1.96 and 1.96. Enter them both separated by a space. NOTE: When you come back to do the one-tail diagrams, these numbers will change and there will only be one of them.
    3. Choose Labels
      1. Enter a Title of "Two Tailed Hypothesis Test". NOTE: This will change for a one-tail test. Enter the appropriate title.
    4. Choose Data View
      1. Under Data Display, turn off the Symbols
    5. Click OK to generate the graph
  5. Now comes the fun part, we get to clean up the graph and annotate it! Make sure your graph is essentially correct before continuing. If you have to go back and re-generate the graph using a scatterplot, you will lose all annotations.
  6. Most of these involve editing a piece of the graph. There are a few ways to do this. You can 1) double left click the item, 2) position your mouse over the item, click the right mouse button, and choose Edit, or 3) left click on the item and then press Control-T on the keyboard (or choose the right mouse button and pick edit). The instructions below will just say "edit this part of the graph" and you can use any of the methods to perform the edit.
    1. To clean up the labels we don't want, click on the "p" on the left side of the graph and delete it. Do the same thing for the "z" at the bottom of the graph.
    2. Edit the normal curve
      1. Change the line type to custom with a size of 3
    3. Edit the y axis scale on the left side of the graph.
      1. Under Scale, change the minimum scale range to 0 and the maximum scale range to 0.6.
      2. Under Show, uncheck all boxes, we don't want a vertical axes at all
    4. Edit the x axis scale at the bottom of the graph
      1. Under Scale, change the minimum scale range to -3.5 and the maximum scale range to 3.5
      2. Under Attributes, change the line size to 2
      3. Under Font, change the font to Arial and the font size to 12
    5. Edit the reference line at 1.96. (NOTE: This number will change for the one tail tests)
      1. Under Attributes, select a Custom line type of dashed lines with a thickness of 2. Check the box that says to apply attributes to all reference lines to save having to do this to the other line at -1.96
      2. Under text, change it from 1.96 to "Critical Value z=1.96"
      3. Under font, change the font to Arial, the color to red and the font size to 12. Check the box to apply font changes to all reference lines
      4. Under alignment, set the text angle to 90 degrees, choose a custom position and make it "Below, to the left"
    6. Edit the reference line at -1.96 (NOTE: You will not have a second reference line for a one tail test).
      1. Under text, change the text to "Critical Value z=-1.96"
      2. Under alignment, set the text angle to 90 degrees, choose a custom position and make it "Below, to the right
    7. Edit the title and change the font to Arial and the font size to 20 points.
  7. Graph Annotation ToolsAt this point, we are ready to start adding things to the graph, rather than just changing what is there. There is a graph annotation toolbar that is active while you're editing a graph. If the toolbar does not appear on your screen, then go to Tools / Toolbars / Graph Annotation Tools and turn it on. Until this point, we have been using the Select Mode (the arrow), but now we're going into insert Text (the T) or a Lines (the line). Click on the appropriate mode to insert the right object. Here are generic instructions for each type of object.
  8. Insert the horizontal lines for the areas. You will need three of them, one for the left portion, one for the middle, and one for the right. For each line, do the following.
    1. Change the style to small dashes, the color to dark green, the thickness to 2 and add arrows where appropriate.
    2. Use the mouse or arrow keys to carefully align the lines with each other
  9. Insert the labels for "Critical Region", "Non-Critical Region", "Retain Ho", and "Reject Ho". There is not an easy way to put in an H0, so just use a lowercase O so it is close.
    1. Change the font to Arial
    2. Change the font size to 14.
  10. The labels for the areas require changing fonts and your output won't look exactly like mine because I made it much harder than it needs to be.
    1. Insert text that says "1-a=0.95" and "a/2=0.25". That is, use the a instead of an alpha.
    2. Then edit the text and set the font to Symbol and make the size 12.
  11. Don't close your graph out. Make sure you save it as part of your project. But when you're all done, copy and paste it into Word.
  12. Now repeat the same thing for the left and right tailed hypothesis tests. Remember that you will only have one critical value for those and it won't be at ±1.96 any more.