Technology Exercise 5: From Data at Hand to the World at Large

Constitutional Amendment (Question 1)

Generating 200 samples of size 1004

Don't bother to label your columns, we'll refer to them as C1, C2, C3, etc. You can specify a range of columns as C1-C10 (to refer to the ten columns C1, C2, C3, ..., C10).

This problem satisfies the conditions of a binomial experiment. There are a fixed number of independent trials each having exactly two possible outcomes. The number of trials is 1004, one person's answer doesn't affect another's, and there are two possible answers (opposed or not opposed). Note that not opposed isn't the same as in favor of, it might include those who are undecided or don't care, but we need two categories, success or failure, so success is someone who is opposed and failure is everyone else.

Since this is a binomial problem, each of the 1004 persons is a Bernoulli trial with probability of 0.64 of success.

  1. Choose Calc / Random Data / Bernoulli
    1. Generate 1004 rows of data
    2. Store them in C1-C200 (columns 1-200). This allows you to do all 200 trials at once. Pretty nifty, huh?
    3. The probability of success is 0.64.
    4. Click OK
  2. Choose Stats / Basic Statistics / Store Descriptive Statistics
    1. The variables are C1-C200
    2. Click on Statistics and check Sum. Uncheck everything else.
    3. Click OK. What this command did was to add up the number of 1's in each column and create 200 new columns called sum1, sum2, ..., sum200 that each contain the number of successes for each of the 200 samples.
  3. Unfortunately, those 200 samples need to be in one column so we can work with them instead of in 200 columns. This step will fix that. Choose Manip / Stack / Stack Columns.
    1. Stack the columns sum1-sum200 (or you could use C201-C400)
    2. Store the stacked data into a new worksheet. You can leave the name of the worksheet blank or give it a name like "amendment" to make it easier to find.
    3. Make sure the Use variable names in subscript column box is not checked.
    4. Click OK.
  4. The new worksheet contains two columns. One called Subscripts that contains the numbers 1 through 200 and C2, it isn't labeled, and contains the number of successes for each of the 200 trials. Label C2 as "x". In a binomial experiment, x represents the number of successes.
  5. Label a blank column as "p-hat". When dealing with proportions, p hat (a p with a carat symbol over it) is the sample proportion.
  6. Choose Calc / Calculator
    1. Store the results into p-hat
    2. The expression is "x / 1004". The 1004 is our sample size.
    3. Click OK
  7. Close the original worksheet that had the 200 samples of sample size 1004 in it. This worksheet will cause the files to be so big that they use up the allocated space on the network and people can't save their files.

Graphical Summary

  1. Choose Stat / Basic Statistics / Display Descriptive Statistics
  2. The variable is p-hat
  3. Click on Graphs
    1. Check the Graphical Summary box
    2. Click OK
  4. Click OK

When you copy this into Word, be sure to drag the box a little bit bigger so that you can read all the numbers.

Be sure to close the worksheet that contains the data in C1-C200 and Sum1-Sum200. If you don't, the file will be about 75 times bigger than it needs to be and can easily fill up all of the allocated space on the network drive. When this happened, students were unable to save their work and lost everything they had done. So be sure to close that first worksheet before saving your file.

Sampling Distribution of Sample Means (Question 2)

Create a probability distribution

A probability distribution is a list of all the values a random variable can assume with their associated probabilities.

Here are some sample probability distributions. Don't use these, make up your own.

The distribution for flipping 3 coins where x is the number of heads.

x 0 1 2 3
p(x) 1/8 3/8 3/8 1/8

The distribution for rolling 2 dice, where x is the sum of the values.

x 2 3 4 5 6 7 8 9 10 11 12
p(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

A made up distribution.

x -2 1 4 9
p(x) 0.42 0.18 0.16 .24

The important thing is that all of the probabilities are between 0 and 1 and the sum of all the probabilties is 1.

Now, make up your own probability distribution. Be sure to put your distribution into Word so we all know what distribution you're working with.

Finding the mean, variance, and standard deviation of your probability distribution.

Follow the examples from chapter 16 where we created a "x p(x)" row and an "x2 p(x)" row to the distribution and then filled them in.

Entering your Distribution into Minitab

You will probably want to start a new worksheet for this problem.

  1. Label two columns as x and p
  2. Enter the values for x into the x column
  3. Enter the probabilities p(x) into the p column

Generating samples from your distribution

You are to generate 200 samples of size 500 from your distribution. We will do this similarly to the first problem on this tech exercise, but we'll use a different distribution to accomplish it.

  1. Choose Calc / Random Data / Discrete
    1. Generate 500 rows of data
    2. Store in the 200 columns C3-C202. Columns C1 and C2 are already used for the x and p, so we have to start with column 3. If you didn't start a new worksheet, you'll have to adjust your columns, so it's probably best if you start a new worksheet before doing this problem.
    3. The values are in x
    4. The probabilities are in p
    5. Click OK
  2. Choose Stat / Basic Statistics / Store Descriptive Statistics
    1. The variables to describe are in C3-C202
    2. Click on Statistics and make sure the only thing checked is mean
    3. Click OK. This will generate 200 new variables called Mean1, Mean2, ..., Mean200. Each column contains one number, the mean of the 500 values in the sample.
  3. Unfortunately, those 200 means are in rows and we need them in columns. Choose Manip / Stack / Stack Columns
    1. The columns to stack are mean1-mean200
    2. Create a new worksheet. You can leave it unnamed or give it a name like "means" to make it easier to find.
    3. Make sure that the Use variable names in subscript column box is not checked.
    4. Click OK
  4. Label the C2 column of the new worksheet as "mean"
  5. Close the original worksheet that had the 200 columns of data in it. Otherwise, we'll run out of room on the network drive.

Describing the distribution of sample means

  1. Choose Stat / Basic Statistics / Display Descriptive Statistics
    1. Describe the variable mean
    2. Click on Graphs and turn on graphical summary. Click OK
    3. Click OK

Compare your results to those of the Central Limit Theorem in the book. Be sure to address all the parts of the Central Limit Theorem: shape, mean, and standard deviation. Remember that the sample size was n=500. The number of samples (200) doesn't figure into the process, just the size of each sample.

Note that your results for the standard deviation may be off slightly because Minitab gives you the sample st. dev. and the Central Limit Theorem deals with the population st. deviation.

Legalization of Marijuana (Question 3)

Use the data from the 2000 General Social Survey to answer this question. The question asked people was "Do you think the use of marijuana should be made legal or not?" and the results are stored in the variable "grass".

The claim about 1/3 of the population believing it should be legal is from all people who responded to the survey in 2000. We're going to see if the younger generation feels the same as all ages.

Gathering the Data

  1. Visit the General Social Survey website at http://www.icpsr.umich.edu:8080/GSS/homepage.htm
  2. Click on Analyze
  3. Select Frequencies or crosstabulation and then click Start
  4. Enter these values (without the quotes) on the screen that appears.
    1. The Row variable should be "grass"
    2. The Filter variable should be "year(2000) age(18-24)". The age(18-24) part limits the selection to just those people ages 18 to 24.
  5. Uncheck any percentaging boxes.
  6. Click Run the Table

Conducting the Hypothesis Test

This is a sample about a single population proportion. Since we're trying to test the proportion who feel that it should be legal, our success will be someone who thinks it should be legal. Don't let your personal feelings on the issue get in the way. Success is defined by what you're trying to test, not by how you feel.

  1. Choose Stat / Basic Statistics / 1 Proportion
  2. Click the Summarized data radio box
    1. The number of trials is the total number of respondents
    2. The number of successes is the number of people who feel it should be legal.
  3. Click the Options button
    1. Make sure the confidence level is set to 95 percent
    2. The test proportion should be 1/3. Enter it as 0.333333333 (put in a lot of threes)
    3. Check the Use test and interval based on normal distribution box
    4. Click OK
  4. Click OK

Take the results that Minitab gives you and use it to finish the hypothesis test.

Your explanation in Word should be a complete hypothesis test, not just the results from Minitab.

Freedom of Speech (Question 4)

Use the data from the 2000 General Social Survey to answer this question. The data we're interested in is the proportion of black males and proportion of black females who believe a racist should be allowed to speak.

The question asked the people taking the survey was "consider a person who believes that Blacks are genetically inferior. If such a person wanted to make a speech in you community claiming that Blacks are inferior, should he be allowed to speak, or not?" The answer to that question is contained in the variable "spkrac".

Gathering the data

  1. Visit the General Social Survey website at http://www.icpsr.umich.edu:8080/GSS/homepage.htm
  2. Click on Analyze
  3. Select Frequencies or crosstabulation and then click Start
  4. Enter these values (without the quotes) on the screen that appears.
    1. The Row variable should be "spkrac"
    2. The Column variable should be "sex"
    3. The Filter variable should be "year(2000) race(2)". The race(2) part limits the selection to just blacks.
  5. Uncheck any percentaging boxes.
  6. Click Run the Table

Conducting the hypothesis test with Minitab

This is a hypothesis test about the proportion of two independent populations (males and females are different). Since we're interested in the proportion of people who would allow the racist to speak, a success is when someone says they would allow the speech. Do not use your personal feelings about the matter when determining success or failure. Success is based on what you're trying to determine.

  1. Choose Stat / Basic Statistics / 2 Proportions
  2. Check the Summarized Data box
  3. Let the first sample be males and the second sample be females (or vice versa, it really doesn't matter as long as your consistent. I write it this way because that's the way the GSS gives it to us).
    1. Enter the total number of trials into the "trials" box and the number who would allow the speaker into the "successes" box.
    2. Do this for both samples (males and females).
  4. Check the Options settings
    1. Make sure the confidence level is at 95 percent
    2. Since we're testing that there is no difference, the test difference is 0 (no difference)
    3. This is a two tailed test, so the alternative should be not equal.
    4. Check the box to use the pooled proportion. We use this when we think the underlying populations are the same. Since we're only looking at blacks, this is a reasonable assumption.
  5. Click OK

Take the results that Minitab gives you and use it to finish the hypothesis test.

Your explanation in Word should be a complete hypothesis test, not just the results from Minitab.

On another note, we looked at the proportion of black men and black women for this problem. You can run other tests. What might be more interesting to some people is to compare the proportion of whites to the proportion of blacks. If you wanted to do that, then you would use a column of "race" and filter of "year(2000)" on the form. If you do this analysis, you will find that there are differences between the races, not just in freedom of speech but in other issues like capital punishment (use the "cappun" variable) as well.