Even More About Correlation

November 3, 2008 by  
Filed under Even More

Correlations don’t prove causation. A strong correlation is a necessary indicator of causation but it is not sufficient. When a cause-effect relationship exists, there will be a strong correlation between the variables. But a strong correlation does not mean that variable A causes variable B.

In correlations, A can cause B. Or, just as likely, B can cause A. Or, just as likely, something else (call it C) causes both A and D to occur.

For a simple example, let’s assume that we know nothing about science. But we do notice that when the sun comes up, it gets warm outside. From a statistical point of view, we can’t tell which causes which. Perhaps the sun coming up makes it get warm. But it is as likely that when it gets warm the sun comes up. Or the sun and warmth are caused by something else: a dragon (pulling the sun behind it) flies across the sky blowing it’s hot breath on the earth (making it warm).

You might laugh at this illustration but think how shocked you’d be if tomorrow it got warm and the sun didn’t come up!

It is, of course, perfectly OK to infer causation from correlational data. But we must remember that these inferences are not proofs; they are leaps of faith. Leaping is allowed but we must clearly indicate that it is an assumption, not a fact

Reliability & Validity

Although correlations can’t prove cause and effect, they are very useful for measuring reliability and validity. Reliability means that you get the same results every time you use a test. If you’re measuring the temperature of liquid and get a reading of 97-degrees, you would expect a reliable thermometer to yield the same result a few second later. If your thermometer gives different readings of the same source over a short period of time, it is unreliable and you would throw it away.

We expect many things in our lives to be reliable. When you flip on a light switch, you expect the light to come on. When you get on an elevator and push the “down” button, you don’t expect the elevator to go sideways. If you twice measure the length of a table, a reliable tape measure will yield the same result. Even if your measuring skill is poor, you expect the results to be close (not 36 inches and then 4 inches). You expect the same results every time.

Reliability, then, is the correlation between two observations of the same event. Test reliability is determined by giving the test once and then given the same test to the same people 2 weeks later. With this test-retest method, you would expect a high positive correlation between the first time the test was given and the second time.

A test with a test-retest reliability of .90 (which many intelligence tests have) is highly reliable. A correlation of .45 shows a moderate amount of reliability, and a coefficient close to zero indicates the test is unreliable. Obviously, a negative test-retest reliability coefficient would indicate something was wrong. People who got high scores the first time should be getting high scores the second time, if the test is reliable.

There are 3 basic types of reliability correlations. A test-retest coefficient is obtained by giving and re-giving the test. A “split half” correlation is found by correlating the total score for the first half with the total score for the second half for each subject. A parallel forms correlation shows the reliability of two tests with similar items.

Correlations also can be used to measure validity. Although a reliable test is good, it is possible to be reliably (consistently) wrong. Validity is the correlation between a test and an external criterion. If you create a test of musical ability, you expect that musicians will score high on the test and those judged by expects to be unmusical to score low on the test. The correlation between the test score and the expert’s rating is a measure of validity.

Validity is whether a test measures what it says it measures; reliability is whether a test is consistent. Clearly, reliability is necessary not sufficient for a test to be valid.

Significance

It is possible to test a correlation coefficient for significance. A significant correlation means the relationship is not likely to be due to chance. It doesn’t mean that X causes Y. It doesn’t mean that Y causes X; or that another variable causes both X and Y. Although a correlation cannot prove which causes what, r can be tested to see if it is likely to be due to chance.

First, determine the degrees of freedom for the study. The degrees of freedom (df) for a correlation are N-2. If there are 7 people (pairs of scores), the df = 5. If there are 14 people, df = 12.

Second, enter the statistical table “Critical Values of the Pearson r” with the appropriate df. Let’s assume there were 10 people in the study (10 pairs of scores). That would mean the degrees of freedom for this study equals 8.

Go down the df column to eight, and you’ll see that in order to be significant a Pearson r with this few of people the magnitude of the coefficient has to be .632 or larger.

Notice that the table ignores the sign of the correlation. A negative correlation of -.632 or larger (closer to -1) would also be significant.

Evaluate r-squared

A correlation can’t prove that A causes B; it could be that B causes A…or that C causes both A & B. The coefficient of determination is an indication of the amount of relationship between the two variables. It gives the percentage of variance that is accounted for by the relationship between the two variables.

To calculate the coefficient of determination, simply take the Pearson r and square it. So, .89 squared = .79. In this example, 79% of the variance can be explained by the relationship between the two variables. Using a Venn diagram, it is possible to see the relationship between the two variables. It is the area of overlap:

To calculate the amount of variance that is NOT explained by the relationship (called the coefficient of non-determination), subtract r-squared from 1. In our example, 1-r2 = .21. That is, 21% of the variance is unaccounted for.

 

NOW YOU CHOOSE:
   
Day 5: Correlation
    Bit More About Correlation
    Even More About Correlation
    Calculate Correlation
    Practice Problems
    More Practice Problems
    Word Problems
       
Sim1            Sim2             Sim3
        Sim4            Sim5             Sim6
        Sim7            Sim8             Sim9
    Basic Facts About Correlation
    Vocabulary
    Formulas
    Quiz 5
    Summary

 

Comments

Feel free to leave a comment...
and oh, if you want a pic to show with your comment, go get a gravatar!