Description
Background and aim: Analysis of data obtained from genome wide gene expression experiments is challenging, due to the huge amount of variables, management of the data and the need for multivariate analysis. We here present the R package: pcaGoPromoter that facilitates the interpretation of genome wide expression data to overcome these problems. In a first step principal component analysis is applied to overview any differences between the observations and possible groupings. The next step is interpretation of the principal components with respect to both biological function and involvement of predicted transcription factor binding sites. The robustness of the results is evaluated using cross validation. Illustrative plots of PCA score plots and Gene Ontology terms are available. To illustrate the functionality of the R package, we designed a serum stimulation experiment, where the main biological outcome is well documented. Results: Samples from the serum stimulation experiment were analyzed using the Affymetrix Human Genome U133 Plus 2.0 chip. The array data were analyzed by the tools of the pcaGoPromoter package, which resulted in a clear separation of the observations into the three experimental groups - controls, serum only and serum with inhibitor. The functional annotation of the axes in the PCA score plot showed the expected serum promoted biological processes such as cell cycle progression and the predicted involvement of the expected transcription factors including E2F. In addition unexpected results, e.g. the cholesterol synthesis in serum depleted cells and NF-B activation in inhibitor treated cells were uncovered. Conclusion: The pcaGoPromoter R package provides a collection of tools for analyzing gene expression data. It works with any platform using gene symbols or Entrez Ids as probe identifiers. In addition support for several popular Affymetrix GeneChip platforms is provided. The tools give an overview of the data via principal component analysis, functional interpretation by Gene Ontology terms (biological processes), and indication of involvement of possible transcription factors. Thus, pcaGoPromoter structures the high-dimensional data of gene expression experiments and can be applied to generate hypotheses for further exploration.