the phalanx friendly fire problem & tukey's warning
statigrafix complexity notes
explanations for the masses series [e4m]
due diligence and data analysis series [dd & da]


phalanx-friendly-fire.jpg

the combination of some data
and an aching desire for an answer does
not ensure that a reasonable answer can be extracted
jw tukey / "sunset salvo" am stat 1986


what is phalanx and its problem?

phalanx is one of the u.s. navy's key close up "see-whiz" weapons -an automated gatling gun that shoots waves of tungesten-tipped rounds at incoming air targets including fast-moving missiles. in its latest incarnation even surface and diving targets.

phalanx illustrates the wide gap between two closely related mathematical fields: control theory on the one hand, and pattern recognition on the other.

interesting distinction because for example vladimir vapnik - a pioneer in machine learning - points out, in the soviet union, machine learning sort of detached from statistics programs and became more closely linked with control systems.

one of the features of san diego is occasional bar conversations with sailors. i explore the boundary between classified and not. for example. since i'm interested in data visualization, i asked a sonar operator how many pixels - in other words, what's the resolution - on his submarine monitor that displays sonar patterns? turns out that's classified.

a weapons radar tech on a carrier who is familiar with phalanx told me that ever since the 1996 war game incident in which a japanese destroyer's phalanx destroyed an american a-6 jet, navy pilots have been apprehensive of phalanx's engagement zone: its tracking is so accurate in automatic mode that it may shoot a towed target and continue to blast along the tow cable.

and in 1987, an iraqi surface-to-air missile hit the u.s.s. stark which was equipped with phalanx that had been turned off for fear that a friendly ship might trigger it.

in other words, it doens't appear that there's a practical solution to the discrimination problem.

despite its tracking control performance and sophisticated engagement algorithms [search, detection, threat evaluation, acquisition, track, firing, target destruction, kill assessment and cease fire modes] it's difficult to discriminate between friendly and unfriendly targets - which is not a control but a pattern recognition task, although both types are measured in terms of accuracy and error.

but is that a fallacy of algorithms or limitation due to the nature of available sensor data stream?

a classification algorithm is a morphism, from the input data to the output class. such a classifier may confuse things even when input data is informative but in that case there's hope that improved algorithms may correct such deficiency. however, if the input data is not sufficiently informative, no algorithm can be applied to yield accurate classification.

the data patterns associated with the categories to be distinguished must be sufficiently distinct - if the so-called "interclass overlap" is too high, no algorithm will be able to accurately discriminate.

in the case of phalanx, if friendly and unfriendly targets' radar profiles and motion patterns - eg approach velocities - are similar, no algorithm can be expected to distinguish them. is that a hummingbird or a stealth b2 coming your way? it may still not be possible even if radar is augmented with flir or video, which the latest phalanx platforms have, but those datastreams may be combined to reduce uncertainty.

how about a toy example illustrating the above ideas?

here's a somewhat artificial example which hopefully drives the point home. let's say i want to automatically recognize real company names from made up ones. so we're talking about 2 classes, real and fake, also called binary classification.

in a supervised classification approach, the classifer algorithm is fed a "training" dataset which consists of a mixed bag of instances which are labeled by the known class. for instance, in the following training list, "+" labels real firms, and "-" labels made up ones. however, this is the only information that can be given. the algorithm is not allowed to google.

acxiom +
omnicom +
omnicorp -
innogroup -
dickstein shapiro llp +
divisa -
altria +
microsoft +
tapco +
accelerys +
genalytics +
infosense +
secusoft -
autentech -
macrocom -
apple +
orange -
interlink -
physicorp -
energene +
genzyme +
gentech -
.. .

that's it? no more examples?

depending on domain, the number of training examples varies greatly. in clinical research, size of training set is limited by the expense of running studies, so often i see datasets with as few as 30 individuals. there the classifiction is often in terms of a clinical outcomes, eg, healthy or diseased or different disease states.

in the netflix recommendation engine challenge, the # of training examples is on the order of 100k to a million. commercial analytics firms like visa decision sciences may see 100m transactions per day which they seek to classify as ordinary or fraudulent.

an algorithm may perform whatever operation on the input data - first and foremost, data is mapped into a feature space.

in this toy example what is the nature of the feature space?

there is no unique answer. maybe it's the space of character strings, maybe of common root words and suffixes. it's not clear in general either. that's one of the most challenging aspects of machine learning. it's a modeling decision, not a mathematically deductible issue but rather an issue at the interface between application domain and mathematics.

so pattern recognition algorithms are only formalized after the feature space is decided upon?

yes. and here, both real and made-up firm names after all are "made-up", the output of human imagination. because of this mimickry, likely, no matter what feature space you decide on, you will see real and made-up firm names geometrically intermingled - the dreaded "interclass overlap" problem. there's no principle by which to disambiguate them.

the accuracy of the classifier is measured on a fresh batch of "test" data for which label is not provided - either because it's unknown in principle, or is removed specifically for the test.

so now the algorithm is tested on, eg: "compuinc" and "genentech" and performance is computed in terms of accuracy, which is composed of "sensitivity" and "specificity", basically corresponding to: how many innocents did you convict, how many guilty did you let go. or how many unfriendlies did phalanx let through, and how many friendlies did it shoot at.

how can any algorithm be expected to accurately classify "compuinc" and "genentech" given the complex overlap of training data?

this thought experiment illustrates a fundamental limitation of pattern recognition and possibly why phalanx cannot be expected discriminate friend from foe based on available data regardless of how sophisticated pattern recognition methodology advances and orthogonally to its tremendous tracking accuracy.


alan calvitti phd
head of research
statigrafix