home / services / qualifications / projects / lessons / complexity notes / viz links / contact


Alan Calvitti PhD, EECS

Mission Statement

Statigrafix aims to provide researchers and investigators in a variety of application domains, aka domain experts, with exploratory data analysis services via rapid-prototype development of data visualization tools and close collaboration to aid hypothesis formation, aka knowledge discovery, and communication of results, via production of figures and posters.


Strategy for Knowledge Discovery @ Statigrafix

The nascent Analytics industry, for which San Diego is a hub, tends to focus on large-scale software development of EDA and CDA applications typically for large to very large electronic data sets. Because of the large footprint of these applications, they are ill-suited to help individual investigators or small teams of reserarchers with their EDA needs. In this context, it's worth pointing out that the winner of the $1,000,000 Netflix prize to improve the accuracy of their movie reccomendation system via machine learning methods did not even reach the 10% improvement benchmark originally stipulated.

In contrast, Statigrafix focuses on the needs of individual investigators and small teams to gain insight from data via a complementary approach, summarized as follows.

  • Exploratory Data Analysis. First distinguished from Confirmatory Data Analysis (CDA) as a separate statistical discipline by John Tukey (1915-2000), the father of 21st century graphical dispays, EDA focuses on hypothesis formation and knowledge discovery in datasets and is complementary to CDA.

  • Data Visualization. Statigrafix's core service is rapid-prototype development of Data Visualization (VIZ) tools enabled by the flexibility of Mathematica (wolfram.com). VIZ tools include relativey simple templates such as scatterplot matrices but also custom structured graphics such as sophisticated timeline or calendar-based plots for time-series data. A key precept of VIZ is to give domain experts first an overview of the data, followed by additional filtering and detail on demand. VIZ allows the human visual system to effectively identify informative events, trends and patterns in datasets. As Howard wainer points out in his text "Graphical Discovery," Two centuries have passed since Playfair's pioneering efforts following the idea that a graph can tell us thing easily that might not have been seen otherwise.

  • Robustness to missing values. Real-world datasets contain varying degrees of missing values. Although missing values are conceptually distinct from existing data. A typical step in CDA is to impute missing values. In contrast, VIZ approaches don't require imputation: it is often possible to structure graphical elements in such a way that patterns in existing data values

  • Iterative Exploration. EDA seems most effective when structured as iterated dialague between domain expert and analyst. Each iteration can be broken down into stages. A typical sequence comprises: 1. Data cleaning and aggregation. 2. design and programming of graphics, possibly annotated. 3. Joint exploration of patterns with domain experts. This typically results in insight into the data and suggestions for subsequent exploration and request for additional details.



    Background

    The idea for Statigrafix emerged from my experience as a Postdoctoral Fellow working on a portfolio of data analysis projects at the UCSD School of Medicine, VA San Diego Healthcare System and the California Institute for Telecommunications and Information Technology. The application domains ranged from healthcare processes, medical informatics, biomedicine and telecom - all examples of complex systems.