Math 113 - Inferential Statistics Project - 70 points

Your task is to perform some real-world inferential statistics. You will take a claim that someone has made, form a hypothesis from that, collect the data necessary to test the hypothesis, perform a hypothesis test, and interpret the results. There will be a written paper and a oral presentation to the class.

You may work in groups of up to three (3) persons. Pick people you can work with; part of the grade will be assigned by the people in the group as to the work you contributed. Do not necessarily pick your friends, pick people who will do a good job. You may work alone, but a (very small) part of the final evaluation will be your ability to work as a team.

You need to submit a proposal defining what it is that you wish to test and how you wish to go about testing it. The instructor will peruse these proposals, make suggestions and give it back to you. If your group can't decide on a project or needs help defining it, see the instructor. You can also get ideas from reading newspapers or online news sites. Keywords like "average" or "more likely" are useful. pollingreport.com is a pretty good repository for finding claims about categorical data.

Make sure you get the project cleared with the instructor before you go collect the data. This is an introductory statistics course, not a graduate level course. The project should not cost you very much money to implement. It will take some time, however, and you should not wait until it's due to get started on it.

The instructor will keep a copy of your written paper.

Project Components

Project proposal

This is to make sure you're on the correct track before wasting lots of time collecting useless information. Your proposal must be in narrative format (not a bulleted list), typed, double spaced, and printed. Wait for approval from the instructor before you beginning your project.

These items should be in your proposal.

A list of the group members with correct spellings of first and last names
The title of the project
An explanation of why you find the topic interesting
The claim being tested and where you found it (cite your sources)
An identification of the type of test: One proportion, two proportions, one mean, two independent means, or paired means
A timeline for completion of the project. Be sure to look at the calendar for when things are due, what else is going on in class, and plan accordingly.

Written report (50 points)

The written report should be in narrative format like you were writing for a magazine article. It should be at least two double spaced pages long with standard margins and font sizes and should be printed.

Here are some specific items to have.

A separate title page
An introduction to the problem including the claim(s) being tested
The context (who, what, where, when, why, how) of the data (remember this is in narrative format) and any possible problems with collecting the data. Cite any references
Descriptive statistics and/or tables depending on your type of data
Appropriate graphs (every project should have at least one graph or chart of the data in it)
Inferential statistics including (see sample section on website)
- the null and alternative hypotheses written symbolically
- statistical output including a test statistic and p-value
- a graph showing the critical and non-critical regions, test statistic, and p-value
- the decision and a conclusion written in terms of the original claim
Summary of the project.
Suggestions for the next time this project is done

Oral presentation (15 points)

This is an oral classroom presentation of 3-5 minutes on why you picked the project you did, and what your results were. It is highly recommended that you create a PowerPoint slideshow to go with your presentation. You can also make transparencies or write on the board if needed. The class and/or instructor may ask questions on why you did something the way you did. These points will be assigned by the other class members as well as the instructor. You will be assigning point totals to the group as a whole, not each individual member of the group. The grade you receive will be the average of the grades the class gives you. If you are not here for your presentation or the presentations of any of the other groups, you will receive a zero for this portion of the project.

Each group presentation will be rated as excellent, average, or poor in the areas of teamwork, effort in preparation for presentation, revelancy of project, knowledge of project, and correct statistical usage.

Individual evaluations (5 points)

This is the only part of the project that is not a group grade; each person in the group needs to submit this individually. Your score will be a combination of the scores given you by each member of the group and the instructor's evaluation of your evaluation.

Turn in a summary paragraph of what each person in the group (including yourself) did and how many points out of five you would give them for their effort. These evaluations should be sent as an email, with the evaluation in the email rather than as a separate attachment. Be sure to put an appropriate subject line. The other students in the group will not see what you wrote about them, just the average score they got from all of the students.

You need to evaluate everyone in the group including yourself. If you're the only person in the group and did all of the work, you still need to evaluate yourself or you'll miss out on the participation grade.

When the instructor grades your evaluation, he is looking for things like the quantity and quality of material written about each person, whether the evaluation was submitted on time, whether the instructions were followed, etc.

What can we test?

Some things are easier to test than other things. The purpose of this project is not to do a full-scale PhD level research project, it is to expose you to the process of hypothesis testing in a real-world application.

You may test means or proportions. You may have one or two samples. Collecting information at multiple locations or at different times does not mean you have two samples. You have two samples when you compare the responses of two groups.

If you are dealing with one sample, then you will need some numerical value to test against. There are not subscripts used with one sample and you're only collecting the one piece of information needed to test the claim.

If you are dealing with two samples, then you do not need a numerical value to test against, you end up comparing the samples to each other. When you collect information, you need to collect at least two things: the group the person belongs to and the actual response to the question.

Categorical Data

If your data consists solely of categories and not measured quantities, then you should be looking at proportions or counts.

Things to look for that let you know you're dealing with categorical data or proportions include: proportions, percents, counts, frequencies, fractions, or ratios. If your data consists of names or labels, you're dealing with categorical data.

This list is a guideline, but counts can also be used as quantitative data as well. You really need to think about the response that was recorded for each case (a row in Minitab terms). Did you record a yes/no response for each case or did you record a number that means something? If it was a yes/no or other categorical data, then this is the place to be.

Example Claims about Categorical Data

93.1% of Americans feel there should not be nudity on television during children's viewing time. This is a claim about a single proportion. We know this because the value includes a percentage and the data is categorical (yes or no), not numerical. The original claim here could be written as p=0.931.
35% of people think Obama will very likely be able to improve economic conditions. Although this was originally presented as a Likert scale (very likely, somewhat likely, not very likely, not at all likely), we need to pick just one category to call success. This is a claim about one proportion and the original claim could be written as p=0.35.
Blacks are more likely to die from a stroke than whites. There are two groups here, blacks and whites. Success is defined as someone dying from stroke (as opposed to something else). This is a test of two independent proportions and your original claim could be written as p_b>p_w
Smokers are more likely to be male than female. There is only one group here, smokers. Success is defined as someone being male, which means that more than half of the sample is male. We're testing one proportion and the original claim could be written as p>0.50.

Quantitative (Numerical) Data

If your data consists of measured quantities, then you will probably be testing means.

There are three main ways to analyze one or two means.

A test about a single mean that requires a number as the claimed value.
A test about two independent means doesn't need a number because you compare them to each other. This compares the same thing in two different groups.
A test for two dependent means, often called paired samples, compares two values for each case in the same group.

Example Claims about Quantitative Data

Americans had sex an average of 111 times a year. This is a claim about a mean (average). Even though this seems like a count, the number of times you have sex a year is a number, not a category. The original claim would be written as μ=111.
Women live five years longer than men. This is a claim about two averages and there are two groups, women and men. The original claim here could be written as μ_w-μ_m=5 (the difference in the mean ages of women and men is 5 years).
The rates paid by the insurance company for dental cleaning are in line with the rates charged by the dentists. This is a paired samples t-test where the claim starts off as "paid = charged" and turns into μ_d=0 where d=rate paid - rate charged.
Cub Foods and Krogers charge the same amount. If the same items at each store are compared, then this is a paired samples t test with a claim of μ_d=0 where d=Cub Foods - Krogers. If different items are compared at each store, then this is a two sample t test and the claim is μ_c=μ_k.
The GPA of smokers is lower than the GPA of non-smokers. This is a two sample t test with a claim of μ_s<μ_ns.
Patrons at Cheddar's tip 15%. This is a paired samples t-test with a claim of μ_d=0 where d = tip - 15% of bill.
There are more absences on Fridays than on Mondays. This is a two sample t test with a claim μ_f>μ_m.
The average drive-through time at Culver's less than 270 seconds. This is a one sample t test with a claim of μ<270.