Even More About Regression

November 4, 2008 by  
Filed under Even More

An extension of the correlation, a regression allows you to see if the data looks like a straight line. Obviously, if your data is cyclical, a straight line won’t represent it very well. But if there is a positive or negative trend, a straight line is a good model. It is not so much that we apply the model to the data; more like we collect the data and ask if it looks this model (linear), that model (circular or cyclic) or that model (chance).

If the data approximates a straight line, you can then use that information to predict what will happen in the future. Predicting the future assumes, of course, that conditions remain the same. The stock market is hard to predict because it gets changing, up and down, slowly up, quickly down. It’s too erratic to predict its future, particularly in the short run.

If you roll a bowling ball down a lane and measure the angle it is traveling, you can predict where the ball will hit when it reaches the pins. The size, temperature and shape of the bowling lane are assumed to remain constant for the entire trip, so a linear model would work well with this data. If you use the same ball on a grass lane which has dips and bulges, the conditions are not constant enough to accurately predict its path.

Predicting the future also assumes that the relationship between the two variables is strong. A weak correlation will produce a poor line of prediction. Only strong (positive or negative) correlations will produce accurate predictions.

A regression is composed of three primary characteristics. Any two of these three can be used to draw a regression line: pivot point, slope and intercept.

First, the regression line always goes through the point where the mean of X and the mean of Y meet. This is reasonable since the best prediction of a variable (knowing nothing else about it) is its mean. Since the mean is a good measure of central tendency (where everyone is hanging out), it is a good measure to use.

Second, a regression line has slope. For every change in X, slope will indicate the change in Y. If the correlation between X and Y is perfect, slope will be 1; every time X gets larger by 1, Y will get larger by 1. Slope indicates the rate of change in Y, given a change of 1 in X.

Third, a regression line has a Y intercept: the place where the regression line crosses the Y axis. Think of it as the intersection between the sloping regressing line and vertical axis.

Regression means to go back to something. We can regress to our childhood; regress out of a building (leave the way we came in). Or regress back to the line of prediction. Instead of looking at the underlying data points, we use the line we’ve created to make predictions. Instead of relying on real data, we regress to our prediction line.

There are two major determinants of a prediction’s accuracy: (a) the amount of variance the predictor shares with the criterion and (b) the amount of dispersion in the criterion.

Taking them in order, if the correlation between the two variables is not strong, it is very difficult to predict from one to the other. In a strong positive correlation, you know that when X is low Y is low. Know where one variable is makes it easy to the general location of the other variable.

A good measure of predictability, therefore, is the coefficient of determination (calculated by squaring r). R-squared (r2) indicates how much the two variables have in common. If r2 is close to 1, there is a lot of overlap between the variables and it becomes quite easy to predict one from the other.

Even when the correlation is perfect, however, predictions are limited by the amount of dispersion in the criterion. Think of it this way: if everyone has the same score (or nearly so), it is easy to predict that score, particularly if the variable is correlated with another variable. But if everyone has a different score (lots of dispersion from the mean), guessing the correct value is difficult.

The standard error of estimate (see) takes both of these factors into consideration and produces a standard deviation of error around the prediction line. A prediction is presented as plus or minus its see.

The true score of a prediction will be within 1 standard error of estimate of the regression line 68% of the time. If the predicted score is 15 (just to pick a number), we’re 68% sure that the real score is 15 plus or minus 3 (or whatever the see is).

Similarly, we’re 96% sure that the real score falls within two standard deviations of the regression line (15 plus or minus 6). And we’re 99.9% sure that the real score fall within 3 see of the prediction (15 plus or minus 9).

 

 

NOW YOU CHOOSE:
    Day 6: Regression
    Bit More About Regression
    Even More About Regression
    Calculate Regression
    Practice Problems
    More Practice Problems
    Word Problems
        Sim1       Sim2         Sim3
        Sim4       Sim5         Sim6
        Sim7       Sim8         Sim9
    Basic Facts About Regression
    Vocabulary
    Formulas
    Quiz 6
    Summary

Comments

Feel free to leave a comment...
and oh, if you want a pic to show with your comment, go get a gravatar!