Two independent events become conditionally dependent given that at least one of them occurs. Symbolically:
- If and then
Proof: Note that and
which, together with and (so ) implies that
One can see this in tabular form as follows: the yellow regions are the outcomes where at least one event occurs (and ~A means "not A").
More information A, ~A ...
| A | ~A |
B |
A & B |
~A & B |
~B |
A & ~B |
~A & ~B |
Close
For instance, if one has a sample of , and both and occur independently half the time ( ), one obtains:
More information A, ~A ...
Close
So in outcomes, either or occurs, of which have occurring. By comparing the conditional probability of to the unconditional probability of :
We see that the probability of is higher () in the subset of outcomes where ( or ) occurs, than in the overall population (). On the other hand, the probability of given both and ( or ) is simply the unconditional probability of , , since is independent of . In the numerical example, we have conditioned on being in the top row:
More information A, ~A ...
Close
Here the probability of is .
Berkson's paradox arises because the conditional probability of given within the three-cell subset equals the conditional probability in the overall population, but the unconditional probability within the subset is inflated relative to the unconditional probability in the overall population, hence, within the subset, the presence of decreases the conditional probability of (back to its overall unconditional probability):
Because the effect of conditioning on derives from the relative size of and the effect is particularly large when is rare () but very strongly correlated to (). For example, consider the case below where N is very large:
More information A, ~A ...
Close
For the case without conditioning on we have
So A occurs rarely, unless B is present, when A occurs always. Thus B is dramatically increasing the likelihood of A.
For the case with conditioning on we have
Now A occurs always, whether B is present or not. So B has no impact on the likelihood of A. Thus we
see that for highly correlated data a huge positive correlation of B on A can be effectively removed when one conditions on .