## Hypothesis Testing with SPSS

### Datasets

The following datasets will be used in this document.

CPSPUB-FEB2000
This dataset is a subset of the Current Population Survey - Basic Monthly Survey from February 2000. This document contains 13 variables in 6429 cases.
```	AGE       Age
EDUC      Education Level
RACE      Race
HISP      Hispanic/non-hispanic origin
MARITAL   Marital status
SEX       Sex
HOURS     Number of hours usually worked
OVERTIME  Amount of overtime earnings (weekly)
RATE      Hourly pay rate
UNION     Union member
STATE     State
METRO     Metropolitan status
CHILD     Number of own children <18 years of age
```
The data is available from http://ferret.bls.census.gov/
This dataset consists of the exam scores for the Spring 2000 semester courses taught by James Jones as of 04/12/2000. There are 4 variables and 262 cases.
```	COURSE    Course Number
ID        Unique Identifier
EXAM      Exam Number
SCORE     Score on exam
```
This dataset consists of the exam scores for the Spring 2000 semester courses taught by James Jones as of 04/12/2000. There are 8 variables in 68 cases. This data is set up to for comparison of dependent samples.
```	COURSE    Course Number
ID        Unique Identifier
EXAM1     Exam 1 Score
EXAM2     Exam 2 Score
EXAM3     Exam 3 Score
EXAM4     Exam 4 Score
EXAM5     Exam 5 Score
EXAM6     Exam 6 Score
```

### Testing one mean

Analyze : Compare Means : One-Sample T Test

The One-Sample T Test procedure tests whether the mean of a single variable differs from a specified constant.

Example: Test the claim that the mean hourly rate of males is \$12.

Use the CPSPUB-FEB2000 dataset.

First, Select Cases (sex=1) so that only the males are selected. Then, go to the One-Sample T Test and choose the "Test Variable" of rate and the "Test Value" to be 12.

Be sure to go back to Select Cases and select all cases before doing other tests.

### Testing two means

#### Independent Case

Analyze : Compare Means : Independent-Samples T Test

The Independent-Samples T Test procedure compares means for two groups of cases.

Example: Test the claim that men make more per hour than women.

Use the CPSPUB-FEB2000 dataset.

What we're really asking is whether the mean hourly rate of men is higher than the mean hourly rate for women.

Go to the Independent-Samples T Test and choose "Test Variable" of rate and the "Grouping Variable" to be sex. You must then define groups for the grouping variable. In this case, use 1 (male) and 2 (female) as the groups.

Note that even though we used a grouping variable that only had two values (male, female), you can have more than two, but you can only use two at a time. For example, if you wanted to compare hourly rates by race, whites vs Asians, you could define your groups to be 1 and 4.

#### Dependent Case

Analyze : Compare Means : Paired-Samples T Test

The Paired-Samples T Test procedure compares the means of two variables for a single group. It computes the differences between values of the two variables for each case and tests whether the average differs from 0.

Example: Did the Math 113 students do better on the second exam than the first exam?

You must Select Cases so that only the Math 113 (course=1) students are selected. Then go to Paired-Samples T Test. Choose exam1 and exam2 for the "paired variables".

Note that any person that doesn't have both exam 1 and exam 2 scores will be omitted from the comparison.

### Three or more means

Analyze : Compare Means : One-Way ANOVA

The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitative dependent variable by a single factor (independent) variable. Analysis of variance is used to test the hypothesis that several means are equal. This technique is an extension of the two-sample t test.

Example: Is there any difference in the Math 113 scores?

You must Select Cases so that only the Math 113 (course=1) students are selected. Then go to One-Way ANOVA. Choose a "Dependent Variable" of score and the "Independent Variable" of exam. You may wish to go into options and turn on "Descriptive Statistics" and "Homogeneity of Variance".

### One Proportion

Analyze : Nonparametric Tests : Binomial

The Binomial Test procedure compares the observed frequencies of the two categories of a dichotomous variable to the frequencies expected under a binomial distribution with a specified probability parameter. By default, the probability parameter for both groups is 0.5. To change the probabilities, you can enter a test proportion for the first group. The probability for the second group will be 1 minus the specified probability for the first group.

The "dichotomous" variable that it talks about means that there can only be two values for the variable. There are ways around this, though, as we will see in our examples.

Example: Do at least 2/3 of the people work 40 hours or more per week?

Use the CPSPUB-FEB2000 dataset.

The claims is that at least 2/3 of the people work 40 hours or more, but the variable has to be set up dichotomously, with only two values. We can get around that by specifying a "cut point" where everything less than or equal to the cut point is in the first group and everything more than the cut point is in the second group. The probability specified is for the first group, not the second group.

So, go to Binomial and choose the "Test Variable List" to be hours. "Define Dichotomy" as a "Cut Point" of 39 (remember, we want 40 to be in the second group). Since the "Test proportion" is for the first group, and 40+ belong to the second group, we only want 1/3 of the data to be in the <40 range. So, put in 0.333 for the test proportion.

### Two or more Proportions (Test for Independence)

Analyze : Descriptive Statistics : Crosstabs

The Crosstabs procedure forms two-way and multiway tables and provides a variety of tests and measures of association for two-way tables. The structure of the table and whether categories are ordered determine what test or measure to use.

Example: Is the proportion of female union members the same as the proportion of male union members?

Use the CPSPUB-FEB2000 dataset.

While the textbooks describe a specific test for the comparison of two means, SPSS treats two proportions as a special case of the more generic contingency table. Asking if the proportions are the same is the same as asking if the variables union and sex are independent (that is, there is no difference).

Go to Crosstabs and select the two variables union and sex for the row and column. It doesn't matter which one is which, although if there are more than two choices, then you should make the variable with the most choices be the row variable - simply because it fits on the page better. Go into "Statistics" and turn on the chi-square tests.

### One variable, several proportions (Goodness of Fit Test)

Analyze : Nonparametric Tests : Chi-Square

The Chi-Square Test procedure tabulates a variable into categories and computes a chi-square statistic. This goodness-of-fit test compares the observed and expected frequencies in each category to test either that all categories contain the same proportion of values or that each category contains a user-specified proportion of values.

Example: Test the claim that 78% of the workers are white, 16% are black, 2% Indian, and 4% Asian.

Use the CPSPUB-FEB2000 dataset.

Go to Chi-Square and choose race as the "Test Variable". In this case, we do not want each category to have equal expected frequencies, so click on the "Expected Values: Values" box. Add 78, 16, 2, and 4. The numbers that you enter here can be percentages, expected frequencies, pretty much anything you want. What matters is that they are in the correct proportion to each other and to the order of the categories in the test variable (ie - if Black was 1 and White was 2, you would need to switch the 78 and 16). You could put in 156, 32, 4, and 8 because they have the same ratios to their total as the original percentages. If you want to see the actual percentages, choose the "Frequencies" option.