For the sake of simplicity, we’ll restrict ourselves to the Pearson r, the most commonly used type of correlation. To calculate the Pearson, three Sum of Squares are needed. The Pearson r is the ratio of SSxy to the squareroot of the product of SSx and SSy. Here is the formula:
For SSx, find the Sum of Squares of the X variable. Similarly, SSy is the simply the Sum of Squares of Y. The SSxy, however, is a bit different. First, we have to make a new variable: XY. To do so, we multiply each X by its respective Y. Now we have 3 columns: X, Y and XY. Second, sum the XYs. Third, use this formula:
Notice that this formula is a lot like the regular formula for Sum of Squares; it’s a variation on the theme. It’s the sum of the XYs but we don’t have to square them (they’re already big enough). And we don’t square the Sum of X; we multiple the Sum of X and the Sum of Y together. Fourth, finish off the formula and the result is the Pearson r.
We create a new variable by multiplying every X by its Y partner. So this:
Then, we sum each column. The sum of X = 42, the sum of Y = 72, and the sum of XY is 337.
Calculate the SS for X (136) and the SS of Y (256). And calculate the SS of XY. Multiple the sum of X by the sum of Y (42 * 72 = 3024). Now divide the result by N (the number of pairs of scores = 6); 3024/6 = 504. Subtract the result from the Sum of XYs (337-504 = -167.
Notice the SSxy is negative. It’s OK. The SSxy can be negative. It is the only Sum of Squares that can be negative. The SSx or the SSy are measures of dispersion from the variable’s mean. But we created the XY variable; it’s not a real variable when it comes to dispersion. The sign of SSxy indicates the direction of the relationship between X and Y. So we have a negative SSxy because X and Y have an inverse relationship.
Look at the original data: when X is small (2), Y is large (17). When X is large (13), Y is small (3). It is a consistent but inverse relationship. It’s like pushing the yoke down and the plane going up.
Let’s finish off the calculation of the Pearson r. Multiple the SSx by the SSy (136 * 256 = 34816). Take the square root of that number (sqrt if 34816 = 186.59). Divide the SSxy (-167/186.59 = -.895). Rounding to 2 decimal places, the Pearson r for this data set equals -.90. It is a strong, negative correlation.
NOW YOU CHOOSE:
Day 5: Correlation
Bit More About Correlation
Even More About Correlation
More Practice Problems
Sim1 Sim2 Sim3
Sim4 Sim5 Sim6
Sim7 Sim8 Sim9
Basic Facts About Correlation