# Calculate: Correlation

November 5, 2008 by kltangen

Filed under Correlation, How To Calculate

For the sake of simplicity, we’ll restrict ourselves to the Pearson r, the most commonly used type of correlation. To calculate the Pearson, three Sum of Squares are needed. The Pearson r is the ratio of SSxy to the squareroot of the product of SSx and SSy. Here is the formula:

For SSx, find the Sum of Squares of the X variable. Similarly, SSy is the simply the Sum of Squares of Y. The SSxy, however, is a bit different. **First**, we have to make a new variable: XY. To do so, we multiply each X by its respective Y. Now we have 3 columns: X, Y and XY. **Second**, sum the XYs. **Third**, use this formula:

Notice that this formula is a lot like the regular formula for Sum of Squares; it’s a variation on the theme. It’s the sum of the XYs but we don’t have to square them (they’re already big enough). And we don’t square the Sum of X; we multiple the Sum of X and the Sum of Y together. **Fourth**, finish off the formula and the result is the Pearson r.

EXAMPLE

We create a new variable by multiplying every X by its Y partner. So this:

X |
Y |

2 | 17 |

13 |
3 |

10 | 4 |

3 | 18 |

2 | 19 |

12 | 11 |

becomes this:

X |
Y |
XY |

2 |
17 |
34 |

13 |
3 |
39 |

10 |
4 |
40 |

3 |
18 |
54 |

2 |
19 |
38 |

12 |
11 |
132 |

Then, we sum each column. The sum of X = 42, the sum of Y = 72, and the sum of XY is 337.

Calculate the SS for X (136) and the SS of Y (256). And calculate the SS of XY. Multiple the sum of X by the sum of Y (42 * 72 = 3024). Now divide the result by N (the number of pairs of scores = 6); 3024/6 = 504. Subtract the result from the Sum of XYs (337-504 = -167.

Notice the SSxy is negative. It’s OK. The SSxy can be negative. It is the only Sum of Squares that can be negative. The SSx or the SSy are measures of dispersion from the variable’s mean. But we created the XY variable; it’s not a real variable when it comes to dispersion. The sign of SSxy indicates the direction of the relationship between X and Y. So we have a negative SSxy because X and Y have an inverse relationship.

Look at the original data: when X is small (2), Y is large (17). When X is large (13), Y is small (3). It is a consistent but inverse relationship. It’s like pushing the yoke down and the plane going up.

Let’s finish off the calculation of the Pearson r. Multiple the SSx by the SSy (136 * 256 = 34816). Take the square root of that number (sqrt if 34816 = 186.59). Divide the SSxy (-167/186.59 = -.895). Rounding to 2 decimal places, the Pearson r for this data set equals -.90. It is a strong, negative correlation.

NOW YOU CHOOSE:

Day 5: Correlation

Bit More About Correlation

Even More About Correlation

Calculate Correlation

Practice Problems

More Practice Problems

Word Problems

Sim1 Sim2 Sim3

Sim4 Sim5 Sim6

Sim7 Sim8 Sim9

Basic Facts About Correlation

Vocabulary

Formulas

Quiz 5

Summary