update 10.1.09

10.1.09

input dataset based on Partek "intermediate" fold change normalizalized estimator

gene expr by dx [rows 25-22004 complete]
visually interesting patterns relative to DX[rows 25-1530]

gene expr by gender [rows 25-999 only]
gene expr by HX psych [rows 25-999 only]


notes:

- these visualizations represent "ground truth" empirial distributions post-partek's scan normailzation, but prior to any other multivariate methods

- while it would be convenient to distill these results even further, that would only be possible by introducing additional assumptions

-- for example, robust statistics such as median and interquartile range assume unimodal distributions, which is not warranted here

- because of the small sample size of n=30, estimators of interactions, eg 2 genes with marginal differntial expression profile "adding up" to a strong joint differntial expression pattern, become exponentially under-powered with the number of genes considered, eg 2 (quadratic), 3 (cubic).

- note the marginally higher entropy of males v females the "by gender" [incomplete] viz

- it is difficult to attribute any differential expression across subgroups of interst here (eg DX) as due to biological causes vs. individual heterogeneity or experimental noise

- overall, the expression patterns from nBP=9, nCO=8, nSZ=13, present several puzzles but no conculsive answers and by itself cannot drive effective modeling at the clinical level due to very high inter-DX-class overlap and small sample size.

- The overarching motivation is to explore any linkage between the molecular patterns to clinical-level parameters, without introducing unduly complex assumptions in statistical methodology. This viz represents essentially a lossless representation of the underlying dataset Gursh provided (I've purposely suppressed fold-change units along x-axis to focus on the relative DX patterns)

"In many of the applications of high-density oligonucleotide arrays, the goal is to learn how RNA populations differ in expression in response to genetic and environmental differnces. For example, large expression of a particular gene or genes may cause an illness resultin gin variation between diseased and normal tissue. These sources of variation are referred to as interesting variation. Observed expression levels also include variation introduced during sample preparation, manufacture of the arrays and processing (labeling, hybridization, scanning). These are referred to as sources of obscuring variation.
-RA Irizarry et al"Exploration, normalization an summaries of high density oligonucleotide array probe level data"

"Gene expression profiling is ... the determination of changes in abundance (amount) of each mRNA in a given biological sample. DNA microarray is measuring relative changes in abundance relative to a control sample. This is an important distinction from other biochemical assyas since we are not measuring the specific concentrations of each mRNA in the sample.
- MD Kane "Introduction to gene expression profiling with DNA microarray technology" in G Hardiman ed "Microarray Innovations: Technology and Experimentation"

Slight variations in distance from the feature to the confocal scanner caused by minor warps oor imperfections will drastically change the readings, espcially whn measuring slight changes in quantity...The dynamic range within a cell may be quite large, from <1 mRNA to thousands per cell. .. the problems lie in the nonlinearity of probe response to mRNA copy number and intensity-dependent bias... Classfication has become a much-used method in clinical dsease prognosis and diagnosis. It is important to understand the sources of variability in gene expression data, both biological and technical that can cause misclassification... Normalization methods sometimes greatly affect the outcome and interpretation of array analysis ... Techniques such as feature selection and calssification are highly susceptible to these changes and biases, often resulting in nonoverlapping sets of genes being selected as teh best classifiers without a robust way of accounting for cross-normalization differences, or incorporating prior knowledge as selection criteria
- P Stafford "Data normalization selection" in G Hardiman ed ""Microarray Innovations: Technology and Experimentation"