The Phi coefficient (also known as the Matthews correlation coefficient for binary data) measures the association between two binary variables. It's essentially a Pearson correlation coefficient applied to binary data.
Here's how to calculate it:
1. Create a 2x2 Contingency Table:
Arrange your data into a 2x2 contingency table like this:
Variable Y = 1 | Variable Y = 0 | Row Totals | |
---|---|---|---|
Variable X = 1 | A | B | A + B |
Variable X = 0 | C | D | C + D |
Column Totals | A + C | B + D | N (Total) |
Where:
- A = Number of cases where both X and Y are 1
- B = Number of cases where X is 1 and Y is 0
- C = Number of cases where X is 0 and Y is 1
- D = Number of cases where both X and Y are 0
2. Apply the Formula:
The Phi coefficient (φ) is calculated using the following formula:
φ = (AD - BC) / √((A+B)(C+D)(A+C)(B+D))
3. Interpret the Result:
The Phi coefficient ranges from -1 to +1:
- +1: Perfect positive association (X and Y are perfectly correlated)
- 0: No association (X and Y are independent)
- -1: Perfect negative association (X and Y are perfectly negatively correlated)
Example:
Let's say we have the following data represented in a 2x2 contingency table:
Variable Y = 1 | Variable Y = 0 | |
---|---|---|
Variable X = 1 | 5 | 10 |
Variable X = 0 | 10 | 25 |
Then:
- A = 5
- B = 10
- C = 10
- D = 25
Applying the formula:
φ = (5 25 - 10 10) / √((5+10)(10+25)(5+10)(10+25))
φ = (125 - 100) / √((15)(35)(15)(35))
φ = 25 / √(275625)
φ = 25 / 525
φ ≈ 0.0476
In this example, the Phi coefficient is approximately 0.0476, indicating a very weak positive association between Variable X and Variable Y.
In summary, the Phi coefficient is a useful measure for quantifying the relationship between two binary variables, providing a single value that represents the strength and direction of the association.