Technology Exercise 5: From Data at Hand to the World at Large
Constitutional Amendment (Question 1)
Generating 200 samples of size 1004
Don't bother to label your columns, we'll refer to them
as C1, C2, C3, etc.
You can specify a range of columns as C1-C10 (to refer to the ten columns C1,
C2, C3, ..., C10).
This problem satisfies the conditions of a binomial experiment. There are
a fixed number of independent trials each having exactly two possible outcomes.
The number of trials is 1004, one person's answer doesn't affect another's,
and there are two possible answers (opposed or not opposed). Note that not
opposed isn't the same as in favor of, it might include those who are undecided
or don't care, but we need two categories, success or failure, so success is
someone who is opposed and failure is everyone else.
Since this is a binomial problem, each of the 1004 persons is a Bernoulli
trial with probability of 0.64 of success.
- Choose Calc / Random Data / Bernoulli
- Generate 1004 rows of data
- Store them in C1-C200 (columns 1-200). This allows you to do all 200
trials at once. Pretty nifty, huh?
- The probability of success is 0.64.
- Click OK
- Choose Stats / Basic Statistics / Store Descriptive Statistics
- The variables are C1-C200
- Click on Statistics and check Sum. Uncheck everything else.
- Click OK. What this command did was to add up the number of 1's in
each column and create 200 new columns called sum1, sum2, ..., sum200
that each contain the number of successes for each of the 200 samples.
- Unfortunately, those 200 samples need to be in one column so we can work
with them instead of in 200 columns. This step will fix that. Choose Manip
/ Stack / Stack Columns.
- Stack the columns sum1-sum200 (or you could use C201-C400)
- Store the stacked data into a new worksheet. You can leave the name
of the worksheet blank or give it a name like "amendment" to
make it easier to find.
- Make sure the Use variable names in subscript column box is not checked.
- Click OK.
- The new worksheet contains two columns. One called Subscripts that contains
the numbers 1 through 200 and C2, it isn't labeled, and contains the number
of successes for each of the 200 trials. Label C2 as "x". In a
binomial experiment, x represents the number of successes.
- Label a blank column as "p-hat". When dealing with proportions,
p hat (a p with a carat symbol over it) is the sample proportion.
- Choose Calc / Calculator
- Store the results into p-hat
- The expression is "x / 1004". The 1004 is our sample size.
- Click OK
- Close the original worksheet that had the 200
samples of sample size 1004 in it. This worksheet will cause the files to
be so big that they use up the allocated space on the network and people
can't save their files.
Graphical Summary
- Choose Stat / Basic Statistics / Display Descriptive Statistics
- The variable is p-hat
- Click on Graphs
- Check the Graphical Summary box
- Click OK
- Click OK
When you copy this into Word, be sure to drag the box a little bit bigger
so that you can read all the numbers.
Be sure to close the worksheet that contains the data in C1-C200 and Sum1-Sum200.
If you don't, the file will be about 75 times bigger than it needs to be and
can easily fill up all of the allocated space on the network drive. When this
happened, students were unable to save their work and lost everything they
had done. So be sure to close that first worksheet before saving your file.
Sampling Distribution of Sample Means (Question 2)
Create a probability distribution
A probability distribution is a list of all the values a random variable can
assume with their associated probabilities.
Here are some sample probability distributions. Don't use these, make up your
own.
The distribution for flipping 3 coins where x is the number of heads.
x |
0 |
1 |
2 |
3 |
p(x) |
1/8 |
3/8 |
3/8 |
1/8 |
The distribution for rolling 2 dice, where x is the sum of the values.
x |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
p(x) |
1/36 |
2/36 |
3/36 |
4/36 |
5/36 |
6/36 |
5/36 |
4/36 |
3/36 |
2/36 |
1/36 |
A made up distribution.
x |
-2 |
1 |
4 |
9 |
p(x) |
0.42 |
0.18 |
0.16 |
.24 |
The important thing is that all of the probabilities are between 0 and 1 and
the sum of all the probabilties is 1.
Now, make up your own probability distribution. Be sure to put your distribution
into Word so we all know what distribution you're working with.
Finding the mean, variance, and standard deviation of your probability distribution.
Follow the examples from chapter 16 where we created a "x p(x)" row and an
"x2 p(x)" row to the distribution and then filled them in.
Entering your Distribution into Minitab
You will probably want to start a new worksheet for this problem.
- Label two columns as x and p
- Enter the values for x into the x column
- Enter the probabilities p(x) into the p column
Generating samples from your distribution
You are to generate 200 samples of size 500 from your distribution. We will
do this similarly to the first problem on this tech exercise, but we'll use
a different distribution to accomplish it.
- Choose Calc / Random Data / Discrete
- Generate 500 rows of data
- Store in the 200 columns C3-C202. Columns C1 and C2 are already used
for the x and p, so we have to start with column 3. If you didn't start
a
new
worksheet, you'll have to adjust your columns, so it's probably best
if you start a new worksheet before doing this problem.
- The values are in x
- The probabilities are in p
- Click OK
- Choose Stat / Basic Statistics / Store Descriptive Statistics
- The variables to describe are in C3-C202
- Click on Statistics and make sure the only thing checked is mean
- Click OK. This will generate 200 new variables called Mean1, Mean2,
..., Mean200. Each column contains one number, the mean of the 500 values
in the sample.
- Unfortunately, those 200 means are in rows and we need them in columns.
Choose Manip / Stack / Stack Columns
- The columns to stack are mean1-mean200
- Create a new worksheet. You can leave it unnamed or give it a name
like "means" to make it easier to find.
- Make sure that the Use variable names in subscript column box is not
checked.
- Click OK
- Label the C2 column of the new worksheet as "mean"
- Close the original worksheet that had the 200 columns of data in it. Otherwise,
we'll run out of room on the network drive.
Describing the distribution of sample means
- Choose Stat / Basic Statistics / Display Descriptive Statistics
- Describe the variable mean
- Click on Graphs and turn on graphical summary. Click OK
- Click OK
Compare your results to those of the Central Limit Theorem in the book. Be
sure to address all the parts of the Central Limit Theorem: shape, mean, and
standard deviation. Remember that the sample size was n=500. The number of
samples (200) doesn't figure into the process, just the size of each sample.
Note that your results for the standard deviation may
be off slightly because Minitab gives you the sample st. dev. and the Central
Limit Theorem deals with the population st. deviation.
Legalization of Marijuana (Question 3)
Use the data from the 2000 General Social Survey to answer this question.
The question asked people was "Do you think the use of marijuana should be made
legal or not?" and the results are stored in the variable "grass".
The claim about 1/3 of the population believing it should be legal is from
all people who responded to the survey in 2000. We're going to see if the younger
generation feels the same as all ages.
Gathering the Data
- Visit the General
Social Survey website at http://www.icpsr.umich.edu:8080/GSS/homepage.htm
- Click on Analyze
- Select Frequencies or crosstabulation and then click Start
- Enter these values
(without the quotes) on the screen that appears.
- The Row variable should be "grass"
- The Filter variable should be "year(2000) age(18-24)".
The age(18-24) part limits the selection to just those people ages 18 to
24.
- Uncheck any percentaging boxes.
- Click Run the Table
Conducting the Hypothesis Test
This is a sample about a single population proportion. Since we're trying
to test the proportion who feel that it should be legal, our success will be
someone who thinks it should be legal. Don't let your personal feelings on
the issue get in the way. Success is defined by what you're trying to test,
not by how you feel.
- Choose Stat / Basic Statistics / 1 Proportion
- Click the Summarized data radio box
- The number of trials is the total number of respondents
- The number of successes is the number of people who feel it should
be legal.
- Click the Options button
- Make sure the confidence level is set to 95 percent
- The test proportion should be 1/3. Enter it as 0.333333333 (put in
a lot of threes)
- Check the Use test and interval based on normal distribution box
- Click OK
- Click OK
Take the results that Minitab gives you and use it to finish the hypothesis
test.
Your explanation in Word should be a complete hypothesis test, not just the
results from Minitab.
Freedom of Speech (Question 4)
Use the data from the 2000 General Social Survey to answer this question.
The data we're interested in is the proportion of black males and proportion
of black females who believe a racist should be allowed to speak.
The question asked the people taking the survey was "consider a person
who believes that Blacks are genetically inferior. If such a person wanted
to
make a speech in you community claiming that Blacks are inferior, should he
be allowed to speak, or not?" The answer to that question is contained in
the variable "spkrac".
Gathering the data
- Visit the General
Social Survey website at http://www.icpsr.umich.edu:8080/GSS/homepage.htm
- Click on Analyze
- Select Frequencies or crosstabulation and then click Start
- Enter these values (without the quotes) on the screen that appears.
- The Row variable should be "spkrac"
- The Column variable should be "sex"
- The Filter variable should be "year(2000) race(2)". The
race(2) part limits the selection to just blacks.
- Uncheck any percentaging boxes.
- Click Run the Table
Conducting the hypothesis test with Minitab
This is a hypothesis test about the proportion of two independent populations
(males and females are different). Since we're interested in the proportion
of people who would allow the racist to speak, a success is when someone says
they would allow the speech. Do not use your personal feelings about the matter
when determining success or failure. Success is based on what you're trying
to determine.
- Choose Stat / Basic Statistics / 2 Proportions
- Check the Summarized Data box
- Let the first sample be males and the second sample be females (or vice
versa, it really doesn't matter as long as your consistent. I write it this
way because that's the way the GSS gives it to us).
- Enter the total number of trials into the "trials" box and the number
who would allow the speaker into the "successes" box.
- Do this for both samples (males and females).
- Check the Options settings
- Make sure the confidence level is at 95 percent
- Since we're testing that there is no difference, the test difference
is 0 (no difference)
- This is a two tailed test, so the alternative should be not equal.
- Check the box to use the pooled proportion. We use this when we think
the underlying populations are the same. Since we're only looking at
blacks, this is a reasonable assumption.
- Click OK
Take the results that Minitab gives you and use it to finish the hypothesis
test.
Your explanation in Word should be a complete hypothesis test, not just the
results from Minitab.
On another note, we looked at the proportion of black men and black women
for this problem. You can run other tests. What might be more interesting to
some people is to compare the proportion of whites to the proportion of blacks.
If you wanted to do that, then you would use a column of "race" and
filter of "year(2000)" on the form. If you do this analysis, you
will find that there are differences between the races, not just in freedom
of speech but in other issues like capital punishment (use the "cappun" variable)
as well.