Even More About ANOR
November 9, 2008 by
Filed under Even More
A correlation is a measure of commonality; how much two variables have in common. For the Pearson r, we plotted two continuous variables and looked at the scatterplot of the data. We could see if the trend was generally positive, negative, or had no linear pattern. We then used a regression for predicting. We plotted a regression line through the data as best we can, using the line to make predictions.
An analysis of regression looks at the pattern of data and compares it to the regression line drawn through it. It asks how well the data looks like a straight line. This is a yes-no comparison. We start with the premise that the data doesn’t look like a straight line. We assume that there is no pattern. When we see small variations from a chance pattern, we still don’t accept the model of a straight line. We only change our minds when the pattern is so strong that it is significant.
Our test of significance is a ratio of knowledge. We’re going to compare the variance we understand to the variance that is unexplained. We are going to compare variation between people to the variation within a person’s performance. Later, we will use this procedure to compare differences between experimental groups to variation within each group. That is, we will compare between-subjects variance to error variance (within-subjects variance).
For the present, we can use the same test (Fischer’s F) to test the significance of a regression. Does the data we collected approximate a straight line? To find out, we’re going to divide the variance the two variables share by the variance they don’t share.
To make the process easy, the F test uses a summary table. We just fill in the gaps. The table looks like this:
- Sum of Squares df mean squares
- SS regression
- SSerror
- SStotal
Assuming we’ve already calculated the sum of squares for each variable (X and Y) and the SSxy, filling out the table is really easy. It’s a three step process. First, we find the Sum of Squares for each component. Starting at the bottom row, SStotal equals SSy. No further calculation is necessary to fill in that answer.
Let’s use an example and follow it through the process. Here’s the data:
- X Y
- 2 4
- 5 7
- 3 9
- 6 8
- 11 10
- 12 10
In this example, the SSx is 85.5, the SSy is 26 and the SSxy is 36. The correlation between the two variables equals .76. So in our summary table, we put the SSy as the SStotal, and the table looks like this:
- Sum of Squares df mean squares
- SS regression
- SSerror
- SStotal 26
Moving to other two rows, we partition the Sum of Squares (SS) of the regression into explained SS and unexplained SS. Explained SS is simply the SSy multiplied by r2 (which is called the coefficient of determination). The result is the SSregression; it’s the SS we understand (the part the two variables share). In this example, it is 26 times .58, which equals 15.08.
Similarly, the SSerror is the SStotal times the coefficient of nondetermin-ation (1-r2); in this case that would be .42 times 26 = 10.92. Of course you also could subtract the SSregression from the SStotal. Either way will work.
So far, at the end of step 1, the summary table would look like this:
- Sum of Squares df mean squares
- SS regression 15.08
- SSerror 10.92
- SStotal 26
We’ve partitioned the Sum of Squares into the portion explained by the regression (15.20) and the portion that is due to error (10.92). In this context, anything that isn’t explained by the regression line is considered error.
Step 2 is to identify the appropriate degrees of freedom for each Sum of Squares. You’ll recall that variance is SS divided by its degrees of freedom. A single distribution of scores from a population had a df of N, and a distribution from a population has a df of N-1. In this case, we have two variables so finding the appropriate degrees of freedom is a bit different.
The degrees of freedom (df) for Regression is the number of columns minus one (they call it k-1; k for columns?). Since a simple linear regression has only 2 columns, the df for an Analysis of Regression always equals 1. The df for Error is N-k (number of people minus the number of columns). And Total error = N-1. The summary table now looks like this:
We complete Step 2 by dividing through the appropriate degrees of freedom. There are columns in our ANOR, so df for regression equals 1. There are six people in the study (2 scores for each person), so the df for error is N-k (6 minus 2) which equals 4. And the total degrees of freedom is 6-1 (N-1). So our summary table now looks like this:
- Sum of Squares df mean squares
- SS regression 15.08 1
- SSerror 10.92 4
- SStotal 26 5
Step 3 is to to convert Sum of Squares to variance. We divide each SS by its respective degrees of freedom. The resulting variance terms are called mean squares (just to confuse you). I suppose it is a reminder that variance is the average of the squared deviations from a distribution’s mean. But changing the names does make it harder. Here are the results of our 3-step process:
- Sum of Squares df mean squares
- SS regression 15.08 1 15.08
- SSerror 10.92 4 2.73
- SStotal 26 5 5.20
Of course, the mean squares won’t add up like the other columns because we divided by different amounts But the resulting variance terms are appropriate for their respective portions.
To calculate F itself, we divide the mean squares of regression by the mean squares of error. In this example 15.08 is divided by 2.73, and F = 5.52.
In order for the value we calculated to be deemed significant, if must be larger than a standard value for that size of a data set. We compare the F we calculated to the F table at the back of nearly any statistics book. To find the right value, we select the first column (the same value as the df for SSregression). And to find the correct row, we go down to the row labeled 4 (the same value as the df for SSerror). In this case the book value is 7.71. Our F value was 5.52, so we lose.
Well, lose isn’t really the right word but if you think of research as trying to beat the book value, it will help you remember for to make the comparison. To be significant, F has to be equal to or larger than the one in the Critical Values of F table. If F is equal to or larger than the book value, we win. If our F is smaller than the book’s, we lose.
The proper explanation is that F indicates the likelihood that what we see is not due to chance. If our F is smaller than the book, what we see is likely to be due to chance. If our F is equal to or larger than the book, the relationship between variables is likely to be due to something other than chance.
The F test doesn’t tell us what causes what, only whether it is a likely occurrence or not. In this example, there is no significant impact of X on Y. Any apparent causal relationship can be explained by chance.
NOW YOU CHOOSE:
Day 7: Probability
Bit More About Probability
Even More About Probability
Even More About ANOR
Calculate ANOR
Practice Problems
More Practice Problems
Word Problems
Sim1 Sim2 Sim3
Basic Facts About Probability
Vocabulary
Formulas
Quiz 7
Summary




Comments
Feel free to leave a comment...
and oh, if you want a pic to show with your comment, go get a gravatar!