Description
Considerable variation in gene expression data from different DNA microarray platforms has been demonstrated. However, no characterization of the source of variation arising from labeling protocols has been performed. To analyze the variation associated with T7-based RNA amplification/labeling methods, aliquots of the Stratagene Human Universal Reference RNA were labeled using 3 eukaryotic target preparation methods and hybridized to a single array type (Affymetrix U95Av2). Variability was measured in yield and size distribution of labeled products, as well as in the gene expression results. All methods showed a shift in cRNA size distribution, when compared to un-amplified mRNA, with a significant increase in short transcripts for methods with long IVT reactions. Intra-method reproducibility showed correlation coefficients >0.99, while inter-method comparisons showed coefficients ranging from 0.94 to 0.98 and a nearly two-fold increase in coefficient of variation. Fold amplification for each method was positively correlated with the number of present genes. Two factors that introduced significant bias in gene expression data were observed: a) number of labeled nucleotides that introduces sequence dependent bias, and b) the length of the IVT reaction that introduces a transcript size dependent bias. This study provides evidence of amplification method dependent biases in gene expression data.