Stats: Regression


The idea behind regression is that when there is significant linear correlation, you can use a line to estimate the value of the dependent variable for certain values of the independent variable.

The regression equation should only used

Assuming that you've decided that you can have a regression equation because there is significant linear correlation between the two variables, the equation becomes: y' = ax + b or y' = a + bx (some books use y-hat instead of y-prime). The Bluman text uses the second formula, however, more people are familiar with the notion of y = mx + b, so I will use the first.

a is the slope of the regression line:

b is the y-intercept of the regression line:

The regression line is sometimes called the "line of best fit" or the "best fit line".

Since it "best fits" the data, it makes sense that the line passes through the means.

The regression equation is the line with slope a passing through the point

Another way to write the equation would be

apply just a little algebra, and we have the formulas for a and b that we would use (if we were stranded on a desert island without the TI-82) ...

It also turns out that the slope of the regression line can be written as . Since the standard deviations can't be negative, the sign of the slope is determined by the sign of the correlation coefficient. This agrees with the statement made earlier that the slope of the regression line will have the same slope as the correlation coefficient.

TI-82

Luckily, the TI-82 will find these values for us (isn't it a wonderful calculator?). We can also use the TI-82 to plot the regression line on the scatter plot.


Table of Contents