independence of events and random variables is stronger than correlation, which only takes into account the first 2 moments of distributions, the linear part. it's the lever that allows - when present - breakdown of problems into subproblems. if everything depended on everything else, little could be analyzed. on the other hand, independence is often assumed simply to make analysis tractable. in classical statistical analysis, ie, fisher and neyman-pearson-wald schools, as well as modern nonparametrics, independence is a crucial assumption that pervades the methodology, eg. permits invocation of results such as central limit theorems. this visualization, based on a classic counterexample [roma1986], is the simplest illustrating that random variables can look independent when considered in subsets, yet still be mutually dependent. this situation has analogs elsewhere. in database concurrency control, a process schedule may appear serializable if processes are considered pairwise, yet the schedule may not be serializable overall. the borromean ring configuration is a geometric manifestation of this abstract concept: if any one of the three rings were missing, the remaning two would be unlinked, in other words, they are only coupled together by threes, not pairwise.

borromean-rings.jpg

the example is constructed from 2 coin tosses and 3 binary random variables: X, Y, Z. X=1 if the 1st toss is heads, else X=0. Y=1 if the 2nd toss is heads, else Y=0. Z=1 if the two tosses result in exactly 1 heads, else Z=0. although X, Y are pairwise independent, and so are Y, Z and X, Z, it's clear that X, Y, Z cannot be mutually independent - Z is purposely chosen to play the role of statistical "constraint" although not on X or Y individually. the relationship among the associated distributions (i.e. tables, as the variables are categorical) is not illuminated by this verbal description. the visualization shows how dependencies in the joint distribution XYZ can wash out in marginalization, i.e. in the lower-dimesional projections. specifically, the marginals of X (along red axis), Y (blue axis), Z (green) are represented by their 2-cell 1-dimensional histograms (each shown twice for symmetry); the 2D joint distributions of XY, YZ, XZ are represented by 4-cell histograms (2x2 tables); the 8-cell histogram (2x2x2 table) forming the center wire-cube represents the joint distribution of XYZ. cube volumes are proportional to cell size. to make the visualization easier to parse, the coin has been biased to favor heads 9:1 but these values are suppressed for clarity since they are not necessary to understand the situation. marginalization correponds to summing the cells (i.e. cubes) along the dashed and dotted lines from the joints towards the marginals. by definition, joint distribution of /independent/ random variables is the (outer) product of marginals. so, eg., that X and Y are independent is obvious by visual inspection: XY 2x2 table is the outer product of the X and Y marginals. it's also easy to see that the joint XYZ is not the outer product of the XY, YZ, and XZ marginals. thus X Y Z are dependent. testing deviation from independence is the domain of the nonparametric chi-squared statistic, which simply computes the marginals by summation as described above, then compares the (synthetic) product of the marginals agains the observed joint. unfortunately in higher dimensions chi-squared is ill-conditioned with respect to rare events. more importantly, combinatorial, measure-theoretical, and topological properties of high-dimensional objects are counterintutive. to give but one example, on high-dimensional spheres uniform distributions are preferentially concentrated along equators. any equator! [ledo2001]. far from academic, these issues are central to high-dimensional computational optimization, eg. monte-carlo and genetic algorithms. this visualization also illustrates that marginalization is a forward problem, whereas determination of the joint from the marginals is an inverse problem. MIT linguist steven pinker's informal assertion that inverse problems are unsolvable because they are ill-posed is incorrect. they are solvable but not easily and not by intuitive methods [vapn2000]. computational tomography in medical imaging and geology are examples. the geometric interpretation described here can also be thought of as a simple tomography problem. a more abstract but no less applicable analogy is with borromean rings.