Math 113 - SPSS Project : Sure everyone knows it, but is it true?

Name: _______________________________
Member 2: _______________________________
Member 3: _______________________________
Hospital #: ________________
Random Seed: ________________

There are "truths" that everyone knows and believes. For instance, everyone knows that "there are more births on a full moon." I recently mentioned the birthrate during a full moon to a friend who promptly replied "I know that's true. It's been proven."

"Urban Legends", "Old wives tales", call it what you will, there are many things that we know to be true without any statistical evidence to support that claim. That's statistical ignorance and it's what we're going to be examining this semester for our semester project. We're going to take one of the claims that we've heard and see whether or not there is any statistical basis for the claim.

The Skeptics Dictionary has this to say about full moons and other lunar effects. "The full moon has been linked to crime, suicide, mental illness, disasters, accidents, birthrates, fertility, and werewolves, among other things."1 I'll stop there, although there is more information online. We're going to collect data on births of children and see if there is any correlation between birth rates and the phase of the moon.

Phase I - Group Selection

The instructor will divide the class into groups of about three students each based on criteria such as availability and computer expertise. Each group will be assigned a hospital from Central Illinois from which to collect birth information.

Most of the computer work can be done in class, but there may be some times that the groups will have to meet outside of class, so the instructor tries to group people based on open times.

Phase II - Collecting the Data (10 points)

The hospitals leave the information on their web sites for at least a month (some up to six months), so this is not something that you will have to check every day, but it is something that you should check at least every couple of weeks. There is usually a delay of a few days to a week before new information is posted.

We will be collecting data for all of January, February, and March 2002.

The student should not expect classroom time to collect the data. However, there should be sufficient time immediately prior to or following class to collect the data. The information can be entered into SPSS in either the S137 classroom or the C239 computer lab.

Each group will enter their data into a separate file and the instructor will combine the information into a single file for the steering committee to use in the next phases of the project. The instructor will create a blank template file for you called "master.sav". This file will be located on the acad1 fileserver under the location "\\acad1\stats\01\births" When you choose "open" in SPSS, type in "\\acad1\stats\01\births" and then you should see the file. Before entering your data, use the File / Save As command to save this file as "hosp#" where "#" is the number of your hospital. This should only be done once and then all subsequent data entry should be done with this file. Caution! Only one student may access the data file at a time for editing purposes. More than one student may open the file for use, but you should not make changes while other students have the file open. You will get a warning about the file being in use if you try this.

There will be a syntax file that contains the commands for computing the computed variables. The students may run that on their data file at any time, it will not destroy existing data. The name of the syntax file will be "births.sps" and it will be located in the "\\acad1\stats\01\births" directory on the fileserver. To open this file, choose File / Open / Syntax from the menu. It should default to the same directory as the last opened data file, so you should be in the right place.

To execute the syntax file and compute the remainder of the data, choose Run / All from the syntax file window. Save your data file after running the syntax file. Do not save changes to the syntax file.

Phase III - Descriptive Statistics (30 points)

For your hospital only, answer the following questions. Annotate the output so that each section is clearly labeled and add text to describe any interesting comments. The SPSS command sequence is given to get you started in the right direction.

  1. Create a table describing the mean, median, and standard deviation for the length and weight for each sex of baby. Use Analyze / Compare Means / Means
  2. Give a five number summary for the length and weight for each sex of baby. You'll have to split the file (Data / Split File / Compare groups) by sex, then use frequencies (Analyze / Descriptive Statistics / Frequencies) to get the report, and then unsplit (Data / Split File / Analyze all cases) the file before moving on to the next question.
  3. Generate a joint frequency distribution with the phase of the moon for the rows and the sex of the child for the columns. Use Analyze / Descriptive Statistics / Crosstabs
  4. Find the 90% confidence interval for the true mean length and weight of newborn babies. Use Analyze / Descriptive Statistics / Explore (Generate statistics only, not plots).
  5. Generate a Histogram for the length of all babies. Does the data appear normally distributed? Use Graphs / Histogram
  6. Generate a Q-Q Plot for the weight of all babies. Does the data appear normally distributed? Remove the detrended plot from the output. Use Graphs / Q-Q

Phase IV - Inferential Statistics (40 points)

Use the combined class data to answer the following questions. The combined classroom data will be in the file "births.sav" in the "\\acad1\stats\01\births" directory on the fileserver.

Since large sample sizes can make even the smallest differences significant, restrict your sample to a subset of the entire file. Set the random seed to the birthdate of the youngest member of your group in mmddyy format. For example, if the youngest person in your group was born on July 17, 1982, then you would enter 071782 for the random seed. The random seed can be found under Transform / Random Number Seed. This will guarantee that you will get the same random sample each time you do the analysis (so you can come back in case you don't get it all done the first time) and it will also give different responses for each group so we can look at different samples. Write the random number seed on the front sheet for future reference.

After you have set the random seed (you need to do this each time you open SPSS), randomly select approximately the given percent of cases. To do this, go to Data / Select Cases / Random sample of cases and enter a percent of the total sample size to randomly select. Each question has its own percent to sample. The sample sizes are based on the assumption of approximately 450 births per month between the eight hospitals for three months of data.

Answer the following questions. Be sure to annotate your output so the questions are clearly identified and the answers to the questions are given.

Phase IVa

  1. Is there a difference in the mean weight and lengths of boy and girl babies? Sample 5% and use Analyze / Compare Means / Independent-Samples T Test.
  2. Are half of the children boys? Sample 5% and use Analyze / Nonparametric / Chi-Square.
  3. Is there linear correlation between the weight and length of a baby? If so, describe the regression equation that should be used to predict a baby's weight when you know its length. Sample 5% and use Analyze / Regression / Linear.

Phase IVb

  1. According to the National Center for Health Statistics 2, the births occur on the days of the week with the following proportions Su=10.18%, Mo=14.51%, Tu=16.37%, We=16.05%, Th=15.67%, Fr=15.82%, Sa=11.40%. Test the claim that these percentages are correct. Sample 10% and use Analyze / Nonparametric Tests / Chi-Square.
  2. Is there a relationship between the phase of the moon and the number of births? Use the simplified lunar days (four days instead of eight phases) and assume that each of the four phases occurs with equal frequency. Select all cases and use Analyze / Nonparametric Tests / Chi-Square.
  3. Is there a relationship between the pase of the moon and the gender of the baby? Use all eight moon phases. Sample 25% and use Analyze / Descriptive Statistics / Crosstabs.
  4. Is the mean weight of the babies the same at each of the eight hospitals? Sample 10% and use Analyze / Compare Means / One-Way ANOVA.
  5. Is the mean length of babies the same at each of the four cities? Is the mean length of the boy and girl babies the same? Is there any interaction between the city and the sex of the child? Sample 5% and use Analyze / General Linear Model / Univariate.

Phase V - Evaluation

Project Evaluation (10 points)

As a group, comment on the entire project. As a minimum, address questions like

  1. Was there any data that was collected that wasn't needed?
  2. Was there data that wasn't collected, but it would have been nice to know?
  3. What changes would you make the next time this project is done?
  4. Was the project relevant / interesting?
  5. Are there topics that you would like to see investigated?

This should be typed and only one document per group submitted.

Individual Evaluations (10 points)

As an individual, evaluate each member of your group including yourself. Comment on how much they contributed to the group. Did they show up for all the meetings, did they participate when they showed up, did they pull their weight or did they not do anything.

In addition to a paragraph describing each person, assign them a score between 0 and 10 points for their effort in the group. Remember you are evaluating yourself, also.

The score that you receive for this part of the project will be the mean scores given to you by each person in your group. This may be handwritten and there should be one document per person.

Due Dates:

Tue, January 22 - Phase I
Groups assigned
Fri, February 22 - Phase III (30 points)
Descriptive Statistics. Use the January data only. If you have already entered portions of your February data, then choose Data / Select Cases / If and enter "xdate.month(date)=1" (without the quotes) as the selection criteria.
Tue, April 9 - Phase II
Data collection completed and saved into SPSS. You will need the January data entered before February 22 for Phase III, but you need to have all of your data collected and entered by this date.
Fri, April 19 - Phase IV (15 points)
Inferential Statistics questions 1-3.
Mon, May 6 - Phase IV (25 points)
Inferential Statistics questions 4-8.
Thu, May 9 - Phase V (20 points)
Project and individual evaluations are due.

Data Definitions

Hospitals

The following hospitals will be used. The Hospital ID (used in "collected data" section) precedes each hospital name. The larger hospitals from surrounding areas have been chosen.

Bloomington
1.  Bromenn Healthcare - http://www.bromenn.org/newborngallery/newborngallery.cfm
2.  OSF St. Joseph - http://www.osfstjoseph.org/obstetrics/arrivals.dog
Decatur
3.  Decatur Memorial Hospital - http://www.dmhhs.org/births/
4.  St. Mary's Hospital - http://www.infantxpress.com/depot/20035/calendar
Peoria
5.  Methodist Medical - http://www.growingfamily.com/webnursery/hospitalpage.asp?hospitalID=3838
6.  OSF St. Francis - http://newborns.osfsaintfrancis.org/
Springfield
7.  St. John's Hospital - http://www.growingfamily.com/webnursery/hospitalpage.asp?hospitalID=6853
8.  Memorial Medical - http://www.growingfamily.com/webnursery/hospitalpage.asp?hospitalID=4193

Collected Data

For each child, collect the following information (if available).
hosp:Hospital ID (see above)
date:Date of birth (month, day, year)
time:Time of birth
sex:Gender of child (1=Boy, 2=Girl)
length:Length of child (in inches)
lbs:Weight in pounds
oz:Weight in ounces
name:Child's name*

* This information won't be analyzed, it is collected only for the purpose of identifying the baby so the information isn't entered more than once. The parents' name(s) can be used to determine if there is a single parent listed for the single variable.

Computed Data

The following information will be computed. You do NOT need to record it as you enter the birth information into SPSS.
city:City (computed from hosp, 1=Bloomington, 2=Decatur, 3=Peoria, 4=Springfield)
weight:Combined weight in pounds (computed from lbs and oz)
phase:Lunar phase (computed from the date, 1=New, 2=Waxing Crescent, 3=1st Quarter, 4=Waxing Gibbous, 5=Full, 6=Waning Gibbous, 7=3rd Quarter, 8=Waning Crescent). See http://www.googol.com/moon/ for more information.
moon:Lunar days (Computed from the date and time, 1=New, 2=1stQuarter, 3=Full, 4=3rd Quarter)
wkday:Day of week (computed from the date, 1=Sun, 2=Mon, ..., 7=Sat)

References

  1. The Skeptic\'s Dictionary: A Guide for the New Millennium. © 2002. Robert Todd Carroll. http://skepdic.com/fullmoon.html
  2. National Vital Statistics Report, Vol. 49, No. 1, April 17, 2001. Table 15. Live births by race of mother and observed and seasonally adjusted birth and fertility rates, by month: United States, 1999. http://www.cdc.gov/nchs/data/nvsr/nvsr49/nvsr49_01.pdf