Motivation: Sample source, procurement process, and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intra-group biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori.
Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.
Sex, Specimen part, Disease, Disease stage, Race
View SamplesTo define the molecular abnormalities at the stem cell level in polycythemia vera (PV), we examined global gene expression in circulating CD34+ cells from 19 JAK2 V617F-positive PV patients and 6 normal individuals using Affymetrix oligonucleotide microarray technology. We observed that CD34+ cell gene expression not only differed between the PV patients and the normal controls but also between men and women PV patients. Based on these gender-specific differences in gene expression, we were able to identify 102 genes differentially regulated concordantly by both men and women, which likely represent a core set of genes whose dysregulation is involved in the pathogenesis of PV. Gene expression was verified by Q-PCR of patient CD34+ cell RNA. Using the 102 gene set and unsupervised hierarchical clustering, the 19 PV patients could be separated in two groups that differed significantly with respect to hemoglobin level, thrombosis frequency, splenomegaly, splenectomy or chemotherapy exposure, leukemic transformation and overall survival. These results were confirmed using top scoring pairs, which identified a different set of 29 genes that independently segregated the 19 patients into the same two clinical groups: those with an aggressive form of the disease (7 patients), and those with an indolent form (12 patients).
Two clinical phenotypes in polycythemia vera.
Sex, Disease
View SamplesAberrant activation of signaling pathways controlled in normal epithelial cells by the epidermal growth factor receptor (EGFR) has been linked to cetuximab (a monoclonal antibody against EGFR) resistance in head and neck squamous cell carcinoma (HNSCC). To infer relevant and specific pathway activation downstream of EGFR from gene expression in HNSCC, we generated gene expression signatures using immortalized keratinocytes (HaCaT) subjected to either ligand stimulation or pharmacological inhibition of the signaling intermediaries PI-3-Kinase and MEK or transfected with EGFR, RELA/p65, or HRASVal12. The gene expression patterns that distinguished the various HaCaT variants and conditions were inferred using the Markov chain Monte Carlo (MCMC) matrix factorization algorithm Coordinated Gene Activity in Pattern Sets (CoGAPS). This approach inferred gene expression signatures with greater relevance to cell signaling pathway activation than the expression signatures inferred with standard linear models. Furthermore, the pathway signature generated using HaCaT-HRASVal12 further associated with the cetuximab treatment response in isogenic cetuximab-sensitive (UMSCC1) and -resistant (1CC8) cell lines. Our data suggest that the CoGAPS algorithm can generate gene expression signatures that are pertinent to downstream effects of receptor signaling pathway activation and potentially be useful in modeling resistance mechanisms to targeted therapies.
Gene expression signatures modulated by epidermal growth factor receptor activation and their relationship to cetuximab resistance in head and neck squamous cell carcinoma.
Cell line, Treatment
View SamplesTo determine the differential expression of KRAS-variant HNSCC (head and neck squamous cell carcinoma) cell lines.
A 3'-UTR KRAS-variant is associated with cisplatin resistance in patients with recurrent and/or metastatic head and neck squamous cell carcinoma.
Specimen part, Cell line
View SamplesThis SuperSeries is composed of the SubSeries listed below.
Integrative epigenome-wide analysis demonstrates that DNA methylation may mediate genetic risk in inflammatory bowel disease.
Sex, Age, Specimen part, Subject
View SamplesEpigenetic alterations may provide important insights into gene-environment interaction in inflammatory bowel disease (IBD). Here we observe epigenome-wide DNA methylation differences in 240 newly-diagnosed IBD cases and 190 controls. These include 439 differentially methylated positions (DMPs) and 5 differentially methylated regions (DMRs), which we study in detail using whole genome bisulphite sequencing. We replicate the top DMP (RPS6KA2) and DMRs (VMP1, ITGB2, TXK) in an independent cohort.
Integrative epigenome-wide analysis demonstrates that DNA methylation may mediate genetic risk in inflammatory bowel disease.
Sex, Age, Specimen part
View SamplesWe present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for sequence discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcriptlevel profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.
No sample metadata fields
View SamplesHuntington's disease (HD) is an inherited neurodegenerative disorder caused by an expanded stretch of CAG trinucleotide repeats that results in neuronal dysfunction and death. We made induced pluripotent stem cell (iPSC) lines from HD patients and controls. Though no obvious effects of the CAG expansion on reprogramming or subsequent neural stem cell (NSC) production were seen, HD-NSCs showed CAG expansion-associated gene expression patterns and, upon differentiation, changes in electrophysiology, metabolism, cell adhesion, and ultimately an increased risk of cell death for both medium and longer CAG repeat expansions, with some deficits greater in cells from longer repeat HD NSCs. The HD180 lines were more vulnerable than control lines to cellular stressors and BDNF withdrawal using a range of assays across consortium laboratories. This HD iPSC collection represents a unique and well-characterized resource to elucidate disease mechanisms in HD and provides a novel human stem cell platform for screening new candidate therapeutics.
Induced pluripotent stem cells from patients with Huntington's disease show CAG-repeat-expansion-associated phenotypes.
Specimen part, Disease, Disease stage
View SamplesIn order to study parent-of-origin effects on gene expression, we performed RNAseq analysis (100bp single end reads) of 165 children who formed part of mother/father/child trios where genotype data was available from the HapMap and/or 1000 Genomes Projects. Based on phased genotypes at heterozygous SNP positions, we generated allelic counts for expression of the maternal and paternal alleles in each individual. This analysis reveals significant bias in the expression of the parental alleles for dozens of genes, including both previously known and novel imprinted transcripts. Overall design: This submission contains RNAseq data from 165 children from mother/father/child trios studied as part of the 1000 genomes and/or HapMap projects. We provide raw fastq format reads, and processed read counts per gene. Allelic count information can be provided by directly contacting the authors.
RNA-Seq in 296 phased trios provides a high-resolution map of genomic imprinting.
Specimen part, Cell line, Subject
View SamplesWe performed whole-genome gene expression profiling in Pik3cg-/- mice and subsequent gene ontology clustering of differentially expressed genes compared to wild type mice, in order to investigate the role of Pik3cg in platelet membrane biogenesis and blood coagulation.
Maps of open chromatin guide the functional follow-up of genome-wide association signals: application to hematological traits.
Sex, Specimen part
View Samples