San Diego 'Analytics' Growth Industry, But Is It Just Pie-Charts?

Alan Calvitti, PhD

The San Diego Software Industry Council hosted "Forum on Analytics" in Del Mar this Tuesday, aiming to raise awareness of the San Diego's potential to be a leading hub for this high-tech growth industry.

The 200-plus audience of scientists, software developers, venture capitalists, intellectual property lawyers and sales and marketing people was ringed by technical posters primarily from UCSD's Jacobs School of Engineering. Conversations could be overheard ranging from the impact of digital media and social network sites to artists, to the limitations of "L2 norms" and "Gaussian assumptions" in data mining.

More than one question was fielded to the panelists - which included executives from Fair Isaac, ID Analytics, Salford Systems, RealAge, Teradata, Visual Sciences, Veoh, and Zementis; firms either founded in San Diego or with a strong engineering presence here - to better define "Analytics."

Although there was no consensus, one response was that manipulating Excel pivot tables may constitute "Business Intelligence" or "Reporting" but too simplistic to call Analytics. The latter comprises modern computational statistics methodology, instantiated in algorithms, in turn implemented in software, often to enable real-time or near-real-time automation and decision making, e.g. credit card fraud detection or credit scoring.

It also includes development of data management infrastructure to enable collection from various sources - typically, data is distributed across an enterprise in multiple databases. Up to 80 percent of the analytics effort facing firms may in fact consist of organizing and managing data, according to one of the panelists. This may explain why new hires who respond to job ads seeking PhD-level scientists and modelers to work on "machine learning" wind up parsing XML files and writing "IT glue" instead.

The crunching is not confined to numbers. Mining knowledge from text, images and video streams presents unique challenges. In mining text from unstructured documents, whether corporate email repositories or web blogs, linguistic problems crop up, not only because computational lingustics is still a developing field, but even simply due to the global reach of the internet and internationalization of English: a given term or phrase may have clashing, idiomatic interpretations in Singapore and UK. Ontologies, tools used to organize terminology that can in principle be embedded in data mining algorithms, are too domain-specific to be helpful. Even within a given domain, such as health care, ontologies are considered too unwieldy and inflexible to allow a significant degree of automated decision-support, due to the large, specialized vocabularies involved.


The visibility of data and information visualization as methodologies to aid decision-makers was minimal. Amanda Reed, a partner with Palomar Ventures, a California-based technology venture capital firm, asked the panelists whether they would "continue to provide pie charts" rather than "innovate in visualization". Of the panel-represented firms, only Visual Sciences and Salford Systems seemed to have core products or services focusing on graphical analysis tools. Marvel It, a recent startup, also develops web-based java dashboards according to its CEO, Rick Mortensen, who attended the conference.

In 1993, Larry Smarr, currently director of the California Institute for Telecommunication and Information Technology, a multidisciplinary research center based at UC San Diego and UC Irvine, recognized that humans are becoming "exponentially immersed" in data and that "visual output is the only sensible way for scientists to couple to this numerical reality and to communicate the results to colleagues" and the public at large. "The power of visualization", Dr Smarr continues, is due to the image as a "much more universal language than the underlying mathematics in which the science is couched."

Cognitive scientist Colin Ware offers a complementary, biologically informed rationale for why we should be interested in visualization. "The human visual system is a pattern seeker", he writes in his text, Information Visualization. The visual cortex provides "the highest-bandwidth channel into human cognitive" abilities, which is why "understanding" and "seeing" are used essentially synonymously.

If decision makers can digest more than pie-charts, one wonders about the optimality of basing decisions on scalar parameters such as the odds-ratios typically reported as the result of a clinical trial, or, to return to Analytics, a FICO score.

Alan Calvitti is the founder of and currently seeking employment as statistical analyst specializing in agile data visualization services using Mathematica.