the n.f.c. coin toss hotstreak problem
statigrafix complexity notes
explanations for the masses series [e4m]
due diligence and data analysis series [dd&da]
as of steelers' 6th ring, national conference teams have won the last 12 superbowl coin tosses and 29 of 43:
i ii iii iv v vi vii viii ix
x xi xii xiii xiv xv xvi xvii xviii xix xx xxi xxii xxiii xxiv xxv xxvi xxvii xxviii xxix xxx xxxi xxxii xxxiii xxxiv xxxv xxxvi xxxvii xxxviii xxxix xl xli xlii xliii
so? winning tosses hasn't translated into rings. and 12 in a row is merely an emergent random cluster, a large deviation that should only interest gamblers and sports trivia types. what's the problem?
tossing coins, whether a fixed coin or a series of specially minted ones as in super bowls, is the most trivial stochastic process imaginable. the binomial distribution applies not just approximately, but exactly. for instance it yields explicitly the chance of 12 straight nfc coin wins at 1 in 2048. but "nominal values" of random variables when plugged into simple functions may not give an accurate picture if dependencies are present. and maybe we want to test if dependencies are present.
in other words, the binomial model gives the forward [ie, easy] direction while inverse [ie, data analysis] problems are ill-posed and assumptions need to be introduced?
unfortunately, the power of probability comes from being able to average together enough similar "things" like tosses, via various "laws of large numbers" and "central limits." not to say that there are no methods to deal with finite processes, those do exist, [eg chernoff bounds] but they must be studied in the context of the process not as disembodied data points. in the case of the super bowl tosses, the process is not replicable hence there is no way to test causal mechanisms, only statistical patterns.
the outcomes of the [calls of] tosses are independent samples, meaning that no two tosses influence one another and there is no influence from a parent process, for example that controls the motion of the coin in the air. in other words, here we have none of the problems inherent in more complex processes such as in vivo biology or finance, where hundreds or millions of variables or factors are influencing one another to some degree.
yet based on observed outcomes, if you consider the hypothesis that nfc teams have formed a kabbal and are getting better at telekinesis for example by testing independence: you'll find that statistical inference cannot reject it and with high confidence.
but why would you consider such a ridiculous hypothesis?
because in general you might not know whether the hypothesis is ridiculous. correlative statistics runs into a brick wall when you attempt to interpret results as causal mechanisms. judea pearl points out: a correlation between rain and mud doesn't imply that mud caused rain. see >> for additional details.
the small # of nonreplicable samples like the 43 tosses here is not unusual. in biomedicine in particular scientists try to infer patterns relating, for example gene expression profiles to disease states, based on evidence from as few as 30 patients [due to high current costs of the technology]. if large deviations and spurious clusters emerge in this most transparent of processes with cameras all around, observed by millions, there is no a priori reason to preclude the existence of "spurious" patterns in the more complex contexts. but unlike here, where we immediately recognize the spurious nature of the clusters, it's not in general easy to spot the unicorns.
going back to the concept of independence, any other way to understand it?
suppose you have two coins in your pocket that look & feel indistinguishable but coin a is biased in favor of heads, coin b is fair. you pick a coin at random and get heads in 10 consecutive tosses. what is the chance of getting heads on the next toss?
if you knew which coin had been selected, this probability would either be high [a] or 1/2 [b] independently of the # of previous heads.
however, the coin is unknown and this induces a dependency between the outcomes of the 10 tosses and the next one. you have a strong suspicion that you have picked coin [a] which increases your belief that you will get heads on the next toss.
it is only our state of knowledge about this probability that depends on the previous outcomes. this epistemic-ontological divide makes probability, statistics as well as data analysis processes that leverage those fields challenging to debug even in the simplest engineered stochastic processes.
head of research