Description
Measuring genome-wide changes in transcript abundance in circulating peripheral whole blood cells is a useful way to study disease pathobiology and may help elucidate biomarkers and molecular mechanisms of disease. The sensitivity and interpretability of analyses carried out in this complex tissue, however, are significantly affected by its heterogeneity. It is therefore desirable to quantify this heterogeneity, either to account for it or to better model interactions that may be present between the abundance of certain transcripts, some cell types and some indication. Accurate enumeration of the many component cell types that make up peripheral whole blood can be costly, however, and may further complicate the sample collection process. Many approaches have been developed to infer the composition of a sample from high-dimensional transcriptomic and, more recently, epigenetic data. These approaches rely on the availability of isolated expression profiles for the cell types to be enumerated. These profiles are platform-specific, suitable datasets are rare, and generating them is expensive. No such dataset exists on the Affymetrix Gene ST platform. We present a freely-available, and open-source, multiresponse Gaussian model capable of accurately inferring the composition of peripheral whole blood samples from Affymetrix Gene ST expression profiles. The model was developed on a cohort of patients with chronic obstructive pulmonary disease (COPD) and tested in chronic heart failure patients.