Math 113 - NY Times Book Best Seller List Project

This is a semester long project involving many of the aspects of Statistics. It can also serve as an example for the project that students will have to perform.

The class will be divided into groups of 2-4 students each. Each group will be assigned one store.

Periodically, information will be turned in for a grade. All assignments submitted for a grade should have the name of the bookstore and the name of all the students in the group at the top of the document.

All information submitted to the instructor is to be printed or typed. Sample output from the different parts of the project will be made available in Adobe PDF format as the semester progresses so that you will have some guidelines about how to proceed.

Parts of the project will be due at different times. Due dates and point values will be posted.

Phase 1 - Organization into Groups

This will take place very early in the semester, either during the first or second class period.

Exchange names, phone numbers, email addresses, and schedules with the other people in your group.

Phase 2 - Data Collection / Entry (20 points)

The list of books that you are given is the top 15 books from the NY Times Best Seller List for January 14, 2001. Take the list and visit your bookstore and find the prices for those books. Record the price that store charges. If the store does not carry a book, then write N/A on the sheet.

Some stores carry more than one format of a book. Try to match the format (hard cover / paperback) and the retail price to the one shown so that we are all comparing the same thing.

After you have collected the data, use SPSS and open the file \\acad2\stats\01\books, and enter your data. If there were any books you couldn't find, then enter -1 for the price of that book and it will show up as N/A and not be used in the calculations.

After you have entered the data, save it as \\acad2\stats\01\storename, where "storename" is the name of your store (do not put the .com in the filename). The instructor will go through and merge everyone's data into one data set.

Phase 3 - Descriptive Statistics (20 points)

Each group should describe the data for their store only (apply a filter to the data using the select cases command).

Describe the following for the price of books at your store.

The output from SPSS should be cleaned up and annotated. After the entire group has checked the results, print and submit the report.

Phase 4 - Inferential Statistics (40 points)

The number in parentheses after each question is the section number from the textbook that the material covers.

Q1: Does the mean price of books at your store differ significantly from the overall mean price of books? (7.4)

Begin by finding the mean price of all the books for all the stores.

Then select your bookstore and compare the mean for your store to the numerical value found in the first step using a one-sample t-test.

Q2: Does your store sell books at a 40% discount off the retail price? (8.2)

Select just your bookstore and then compare the price to the discount price using the paired samples t-test. Make this comparison for both the entire store and for each of the six categories of books (use split file).

Q3: Is there a significant difference in the mean prices of online stores and brick and mortar stores? (8.5)

Compare the price using the independent samples t-test. Compare the prices for the entire store and for each format of book (use split file).

Q4: Is there a significant difference in the mean prices of the stores? (11.2)

This will be the same report by all of the groups.

Compare the prices for the entire store using the one-way ANOVA.

If there are significant differences in the mean prices, which stores are different (post-hoc tests)?

Which is the cheapest store? Which is the most expensive store?

Here is sample output for questions 1-3 and question 4.

Phase 5 - Project Evaluations (20 points)

Group Evaluation of Project (10 points)

As a group, comment on the entire project. Address questions like

This should be typed and only one document per group submitted.

Individual Evaluation (10 points)

As an individual, evaluate each member of your group including yourself. Comment on how much they contributed to the group. Did they show up for all the meetings, did they participate when they showed up, did they pull their weight or did they not do anything.

In addition to a paragraph describing each person, assign them a score between 0 and 10 points for their effort in the group. Remember you are evaluating yourself, also.

* The score that you receive for this part of the project will be the mean scores given to you by each person in your group.

This may be handwritten and there should be one document per person.

Spring 2001 Deadlines

Date Description
Thu, Jan 18 Groups assigned
Fri, Feb 9 Data collected and entered into SPSS (20 points)
Fri, Mar 2 Descriptive Statistics Due (20 points)
Thu, Apr 12 Inferential Statistics Questions 1-3 Due (30 points)
Thu, May 10 Project Evaluations Due (20 points)
Fri, May 11 Inferential Statistics Question 4 Due (10 points)


1. -

2. BN.Com - (Barnes & Noble - online)

3. - (Kmart's online store)*

4. -

5. Barnes & Noble (Springfield)

6. WaldenBooks (Forsyth)

7. WalMart

* The instructor has already gathered this data and will use it for generating sample output.

Data Structure

The following variables will be collected for each book.

Entered data

Book store. Each store has been assigned a numeric value as shown in the table above. Enter the number for the store.
Each textbook will be given an ID. We could base this off of ISBN or the title, but that is more than 8 characters and SPSS won't do much with strings longer than 8 characters. By using an ID, we can compute other variables as needed.
Price for the book at your store. Enter the value of -1 if your store does not carry that book.

Computed data

Retail price for book. Based on publisher's information.
40% off the retail price.
1 = Hard Cover, 2 = Paperback
1 = NonFiction, 2 = Fiction, 3 = Advice
1 = Yes (store is online), 2 = No (store is brick and morter)
Combined format / class for grouping purposes. 1 = Hardcover Nonfiction, 2 = Hardcover Fiction, 3 = Hardcover Advise, 4 = Paperback Nonfiction, 5 = Paperback Fiction, 6 = Paperback Advice

SPSS Notes

We will be using SPSS to do the statistical work with this project. SPSS is available in S137 and the machines against the wall in C239. SPSS is commercially licensed software and you are not permitted to take a copy home, so plan on allowing some time at Richland to work on this project.

It is highly recommended that you go through the tutorial to become familiar with SPSS software.

You will open and save files from the academic file server. By doing this, the data and output that you create will be available anywhere in the College and not just on the machine where you work. You may wish to have a floppy disk to make a backup copy.

When you turn on machines, it will ask for a password for Microsoft Networking. This is so that you can get access to the file and print servers. The login is "student" and the password is "richland".

The path that you will enter when you open or save files will be: \\acad2\stats\01\

Be sure to name the files you save as something distinct. Use your name or the name of your store.

There are two types of files that we will be working with. Data files have a .SAV extension and the output Viewer Documents have a .SPO extension. If you have trouble finding the file you're looking for, make sure the document type is set correctly.

If your print upstairs in C239, make sure you select the "Postcript" printer (the one with PS at the end of the name) or SPSS will crash. For this reason, as well as others, it is highly recommended that you save your files before printing.

When the instructions ask you to filter or select cases, go into Data / Select Cases. Choose IF, and then specify a condition like "store=1".

When the instructions ask you to generate output for different classification variables (format, class, online, etc), use the Data / Split File command. Use "compare groups by" and then choose the classification variable.