Simpson’s Paradox: Explained in Simple Terms
The Simpson’s Paradox occurs when several groups of data show a direction but the effect reverses when they are combined.
A real-life example of this paradox is “Kidney Stone Treatment”. After comparing the success rates of two treatments for kidney stones, the following results can be seen:
Based on the overall success rate, Treatment B is the obvious choice since it has a higher success rate. Things get nasty when we segment treatments according to stone size. Now the data are reversed, Treatment A appears to be the better treatment.
Which treatment should we choose?
The paradox can be understood by choosing Treatment A if you have a small stone, and Treatment A again if you have a large stone.
When does this paradox happen?
- Different sample sizes. Due to the high number of cases in groups 2 and 3, the total number heavily depends on them.
- Confounding variables. The stone size is a confounding variable here. Since the success rate is influenced more by the severity of the case (Stone Size) than treatment choice (Success rates are higher in small stone sizes).
The next time you’re segmenting look for:
- The numbers/sample sizes alongside the percentages (Avinash Kaushik’s mantra).
- Factors influencing the data that are not shown
- Create causal diagrams or identify confounding variables.