Finding the Regression Equation

Notes about the September 13, 2004, lecture.

The number one issue on Sep 13 was finding the regression equation. Many of you asked for more paper and pencil work rather than Minitab work. Others of you asked why we're doing so much Minitab if it's not on the test. Let me address the second issue first.

Soapbox mode on

The output from Minitab is on the test. You are not required to go through and enter data and generate the output yourself, but you will be required to interpret the output from Minitab. The first day of class, I told you the emphasis was on understanding the statistics, not finding them. We do not have a heavy emphasis on calculations on the test, nor do we have a heavy emphasis on the technology used to obtain the results. We do have a heavy emphasis on understanding the material.

The test is not the only reason we cover material in this class. In fact, only 62.5% of your grade is based on the tests. Another 12.5% of the grade is based on the technology projects and that does require you to go through and use Minitab. Furthermore, just because something isn't on the test doesn't mean it's not important. I have to write a test that you can get done in the time allocated and if I included everything that I thought was important, you wouldn't have time to finish the test. Instead of complaining that we're covering things that aren't going to be on the test, you should be thankful that not everything we cover is going to be on the test. If it bothers you that much, I can stop telling you what's going to be on the test.

Also, you should be reading your book before I cover material. Many of the questions people had about things would be answered if you read the text. I am not going to cover everything in class that will be on the test because I am expecting you to read your textbook. Many requested additional paper and pencil practice. There are problems in the back of the chapter in the text that provide that.

Soapbox mode off

Finding the Regression Equation from a Graph

Okay, now for the question at hand.

How do we find the regression equation?

Baseball Statistics

The 50 players in the Major League with the most at bats as of 5 pm on Sep 13, 2004, were selected and the number of runs scored (y) versus the number of at bats (x) were compared.

The Pearson correlation coefficient of atbats and runs was 0.144 and the p-value was 0.318.

Consider the following graph. Each variable has a line drawn at its mean and the scale for each axis is one standard deviation for that variable.


Use the information to find the regression equation.

Step 1 - Determine the Summary Statistics from the Graph

Variable Mean St. Dev.
At Bats (x)
Runs Scored (y)

Step 2 - Calculate the Slope, b1

The slope is b1 = r ( sy / sx )

Round your answer to three decimal places and then enter the value for the slope, b1 =

Step 3 - Find the Constant, b0

The regression equation is ŷ = b0 + b1 x

The regression equation always passes through the centroid, (x-bar, y-bar), which is the (mean of x, mean of y). That means you know an x and y coordinate on the line (use the means from step 1) and a slope (from step 2). Subsitute in the values for x, y, and b1 into the equation for the regression line and solve for the y-intercept, b0.

When you find your answer for the constant, round it to three places and enter it here. b0 =

Step 4 - Write the equation

Once you have found the slope and y-intercept, write the equation of the line.

runs = + atbats

Step 5 - Estimate the number of runs a person with 600 at bats would be expected to score.

This question may not always be around, but estimation is generally why we perform regression.

Round your answer to three decimal places if necessary and enter it here. Runs =

See the baseball data used for this example.

Additional Problems

Problems 1 and 2 in chapter 8 give more practice where you are given some of the values and asked to find the others.