Description
Purpose: The cause of labour initiation has yet to be fully elucidated for human pregnancy. This has hindered attempts to find effective therapies for the prevention of preterm labour, which affects up to 10% of pregnancies in the UK and it is the most dominant cause of perinatal death (75% of all cases). The myometrium of the uterus is where contractions that characterise labour take place, and it is here where changes at the molecular level responsible for triggering labour potential originate from. We used RNA-Seq-based mRNA sequencing technology in an attempt to identify mRNA transcripts that are differentially expressed in the myometrium upon the onset of labour, by comparing the expression profiles of tissues samples that represented non-labour (TNL), early labour (TEaL, = 3 cm cervical dilation) and established labour (TEsL, > 3 cm cervical dilation) states at term (> 37 weeks gestation) pregnancy. Methods: Myometrial biopsies from women undergoing Caesarean section were collected in accordance with the Declaration of Helsinki guidelines, and with approval from the local research ethics committee for Chelsea & Westminster Hospital (London, UK; Ethics No. 10/H0801/45). Informed written consent was obtained from all women who participated. Biopsies were excised from the upper margin of the incision made in the lower segment of the uterus, immediately washed with Dulbecco's phosphate-buffered saline (Sigma) and dissected into pieces approximately measuring 2-3 mm3. For RNA study, biopsies were immersed in RNAlater (Sigma) within 6 minutes after biopsy excision from the uterus and stored at 4°c overnight, before being taken out of RNAlater solution to be frozen for long-term storage at -80°c. For CHIP study, biopsies were snap-frozen in liquid nitrogen and stored at -80°c. All specimens were categorised into four groups according to their labour stages: preterm not in labour (PTNL, n=6), term not in labour (TNL, n=8), term in early labour (TEaL, defined as cervical dilatation <3 cm, n=8) and term in established labour (TEsL, defined as cervical dilatation >3 cm, n=6). For each sample, 60-100 mg of myometrium tissue were extracted in TRIzol (Life Technologies) by mechanical homogenisation in a Precellys 24 bead-based homogeniser using 5 cycles of 5000 rpm for 20 seconds, before chloroform treatment and centrifugation at 4°c. RNA was extracted from the aqueous phase of centrifuged homogenates using the TRIzol Plus RNA Purification kit (Life Technologies) with on-column DNase treatment prior to elution, all according to manufacturer's instructions. Final RNA samples were stored at -80°c. The quantity and quality RNA was measured using a Nanodrop ND-1000 spectrophotometer (LabTech), Qubit fluorimeter (Life Technologies) and Bioanalyser 2100 (Agilent Technologies). Preparation of cDNA libraries was carried out using the TruSeq Stranded mRNA Sample Preparation kit (Illumina), following the high-throughput sample (HT) protocol. The quantity and quality of cDNA libraries were also tested by a Qubit fluorimeter and Bioanalyser 2100. TruSeq Stranded libraries were then multiplexed and sequenced with the average of 42 million DNA fragments per sample (100 bp paired-end reads). Quality control was performed using FastQC software (version 0.11.2). RNA-Seq reads was aligned to the GRCh38 Homo sapiens reference genome downloaded from Ensembl (release 81) with HISAT version 2.0.1 using parameters of --known-splicesite-infile --dta-cufflinks --rna-strandness RF --phred33 –p 4 -q. A list of known splice sites generated from the Ensembl GTF file using an accessory python script included in the HISAT2 package was provided to --known-splicesite-infile, of which HISAT2 will make use to assist the alignment of reads spanning two or more exons. Ensembl annotated a total of 65,989 genes, which includes 20,276 protein coding genes. As one human gene usually contains multiple transcript models, we thus conducted a transcript merging procedure to produce gene level models for expression analysis. Specifically, exons labeled as 'retained_intron' were first excluded, then overlapping interval exons of each gene were merged and a final gene level model was produced in GFF format. Only uniquely mapped (i.e. reads with the tag of NH:i:1) reads were used to produce gene read counts and calculate gene expression levels. The raw read count matrix was normalised with DESeq2 (version 1.6.3). Expression level of each gene in each sample was represented as RPKM (reads per kilobase per million mapped reads). Differentially expressed genes (DEGs) between two groups of samples were identified with DESeq2 (version 1.6.3), edgeR (version 3.8.6) and Cuffdiff (version 2.2.1). For DESeq2 and edgeR, we used the normalised read count matrix as input, and for Cuffdiff, we used the alignment bam files with uniquely mapped reads as input. Raw p-values were adjusted by FDR to produce q-values, and q-value of 0.05 were chosen as the cut-off for statistical significance in DESeq2, edgeR as well as Cuffdiff. Results: 22 RNA samples from three different labour groups were sequenced and an average of 53 million reads were obtained from each sample. More than 97.34% of reads were successfully aligned to GRCh38 reference human genome and the unique concordant pair ratio was greater than 92.39%. In total, 60,593 genes were mapped with the following criteria: (1) at least one RNA-seq read assigned to a gene; (2) we only assign a read to a gene when > 90% of this read falls into the exon regions of this gene. The principal component analysis (PCA) of these 22 samples showed that TNL and TEsL samples formed two distinct clusters whereas the TEaL group featured relatively great internal differences. Nevertheless, half of the samples from TEaL group was clustered with the TNL group and the other half was more separated yet closer to two samples of TEsL group. To determine the transcripts associated with labour, three software packages (DESeq2, EdgeR and Cuffdiff) were used to perform differential expression The transcript with a q value <0.05 in at least two methods was defined as a shared differentially expressed gene (DEG). As a result, 132 and 399 genes were identified comparing TNL with TEaL and TEsL, respectively, whereas no gene was significantly differentially expressed between TEaL and TEsL groups. Due to big differences among individual samples, in this study, the expression fold change (FC) was calculated as the ratio of median reads per kilobase per million mapped reads (RPKMs) with the bigger median RPKM divided by the smaller median RPKM. In order to minimise the noise derived from genes with low expression but high FC, we further filtered gene lists according to the following rational: if the original value of any median RPKM was <1, we artificially turned it into 1 before calculating the FC. Finally, two robust gene lists with an expression FC >1.5 between two groups (TNL vs. TEaL and TNL vs. TEsL) were generated containing 70 and 232 DEGs, respectively. Conclusions: This study, for the first time, identifies differentially expressed genes (DEGs) and pathways that are involved in the myometrial transition from non-labouring to labouring phenotype by using samples from different stages of pregnancy and labour. The DEG lists are generated by subjecting the raw data to three sofware packages (DESeq2, edgeR and Cuffdiff), which makes the yield DEGs more robust. We conclude that some early responsive genes in circadian clock and inflammation pathways are likely to account for the labour onset and no significant changes are found on the transcription level once the labour starts. Overall design: Comparisons made between no labour (TNL, n = 8), early labour (TEaL, n = 8) and established labour (TEsL, n = 6) lower segment myometrial tissue samples, which were collected during Caesarean section (CS) with informed written consent