Technology Exercise 4: Inferences When Variables Are Related
M&M Colors (Question 1)
Finding the claimed proportions
- Go to the M&M website
at http://www.mms.com/
- Click on About M&Ms
- Choose Products
- Choose Peanut M&Ms
- The claimed percentages are given, but the web page is not accessible, so if
you're color blind, have your partner look at the page and send a note to
M&M Mars letting them know that their page isn't accessible.
Calculating the Test Statistic
Minitab doesn't do a chi-square goodness of fit test. You should do this part
by hand.
Set up a table something like this. This table should be part of your Word output.
| |
Red |
Orange |
Yellow |
Green |
Blue |
Brown |
Total |
| Claimed Percent |
|
|
|
|
|
|
100% |
| Observed |
|
|
|
|
|
|
|
| Expected |
|
|
|
|
|
|
|
| (Obs - Exp) |
|
|
|
|
|
|
0 |
| (Obs - Exp)2 |
|
|
|
|
|
|
----- |
| (Obs-Exp)2/Exp |
|
|
|
|
|
|
|
Multiply the claimed percent by the total number of observed M&Ms to find
the expected M&Ms for each color. The test statistic is the sum of the
last row.
Finding the P-Value
Your test statistic has a χ2 distribution with df = # of categories - 1. It is a right tail test, so the
p-value is the area to the right of the test statistic.
- Choose Calc / Probability Distributions / Chi-Square
- The Cumulative Probability is the area to the left of the value, so check it
and we'll adjust to find the area on the right later. Do NOT change the non-centrality
parameter. Leave it at 0.
- Enter the degrees of freedom
- Click the Input constant box
- Enter your test statistic for the constant.
The number that Minitab gives you will be the area to the left of the test statistic,
but you need the area to the right. Make the appropriate adjustment.
Creating a similar graph to Tech project 3
Follow the instructions question 3 on technology project 3 except make these changes.
- The underlying distribution is chi-square, not normal.
- Instead of z going from -3.5 to 3.5, let x go from 0 to 15 or so. The 15 is arbitrary.
Make sure it is big enough to cover your test statistic, so if your test
statistic is 30, you'll need to go out at least that far. You may want to
change your step size to 0.2, but it will probably be fine at 0.1 (a smoother
curve).
- This is a right tail test, so only put the critical value (from the table) on
the right side.
Entering the Data into Minitab
Seatbelt Safety (Question 2)
Start a new worksheet for this problem.
- Label columns as year, seatbelt, and fatality
- Gather the information for 1985 through 2002 from the June
2003 Safety Belt Usage in Illinois. This information is from the Illinois Department of Transportation and is at http://www.dot.state.il.us/safetybeltjune2003.pdf. The data is in a chart, so you'll have to read the percents from the top of
the bars. The data is from 1985 to 2003, but the fatality data in the next
part only goes to 2002.
- Gather the information for 1985 through 2002 from the Illinois
2002 Toll of Motor Vehicles Crashes page from the National Highway Traffic Safety Administration at http://www.nhtsa.dot.gov/STSI/State_Info.cfm?Year=2002&State=IL. There is a table toward the bottom of the page that is titled "Fatalities and Fatality Rate per 100 Million VMT". You want the Total Fatality Rate column. Be sure you use the "Fatality Rate" column and not the "Fatalities" column. There is information for 1982 through 2002, but the seatbelt site only
contains information from 1985 on.
Making the Fitted Line Plot
- Choose Stat / Regression / Fitted Line Plot
- The response variable is fatality
- The predictor variable is seatbelt
- Go into Storage and turn on the residuals
- Click OK
The Fitted Line plot also contains the regression equation and the value
of r2, the percent of the variation that can be explained by the regression model.
The output in the session window on Minitab gives much of the same information
including an ANOVA table that contains the F test statistic and the p-value
that can be used for checking correlation.
Checking for Significant Linear Correlation
While the p-value can be found from the ANOVA table, it doesn't give the
value of r, the correlation coefficient.
- Choose Stat / Basic Statistics / Correlation
- The two variables are seatbelt and fatality (order doesn't matter)
- Click OK
The output gives you the correlation coefficient first and the p-value
second. The null hypothesis is that there is no significant linear correlation.
Checking Residuals for Normality
One of the assumptions in regression is that the residuals (the differences
between the estimated value and the actual value) have a normal distribution.
If you turned on the storage of residuals during the fitted line plot part,
then you should have a new variable called RESI1 that contain the residuals.
- Choose Stat / Basic Statistics / Normality Test (or choose Graph
/ Probability Plot / Single)
- The variable is RESI1
- Click OK
Give the Regression Equation
If you determined that there was significant linear correlation (positive
or negative) by rejecting the null hypothesis of no significant linear correlation,
then you should use the regression equation given by the computer. This was
found when you did the fitted line plot. Your equation should look something
like "fatality rate = 3.03814 - 0.0230872 seatbelt" (probably not that exactly).
If, however, you decided that there was no signficant linear correlation
because you retained the null hypothesis of the correlation test, then you
should use the mean of y (y-bar) for the estimated equation. Your equation
should be something like "fatality rate = ####" where #### is the numerical value of the mean of the fatality variable. You'll
have to do descriptive statistics to find out what that is.
Removing the Outlier
There was one year that had a really low seatbelt usage rate. Copy your
data into another worksheet and remove that row (that way you'll still have
the original data should you need to come back to it).
- Highlight all of the data including the variable names. Choose Edit
/ Copy or hit control-C.
- Choose File / New / New Worksheet
- Click in the label cell for C1 and choose Edit / Paste or press
control-V.
- Highlight the row to be deleted by clicking on the row number to
the left of the data.
- Press del (or right click and choose delete row).
Start this process at the part that says "Making
a fitted line plot".
There are a couple of ways to tell which is the better model. You can
look at the p-values (smaller means more significant results), the correlation
coefficients (bigger means more correlation), or the value of r2. Some of these might be the same, in which case you'll have to look at something
else.
Comment on which model is a better predictor of fatality rates in Illinois.