Description
While blood transcriptional profiling has improved diagnosis and understanding of disease pathogenesis of adult tuberculosis (TB), no studies applying gene expression profiling of children with TB have been described so far. In this study, we have compared whole blood gene expression in childhood TB patients, as well as in healthy latently infected (LTBI) and uninfected (HC) children in a cohort of Warao Amerindians in the Delta Amacuro in Venezuela. We identified a 116-gene signature set by means of random forest analysis that showed an average prediction error of 11% for TB vs. LTBI and for TB vs. LTBI vs. HC in our dataset. Furthermore, a minimal set of only 9 genes showed a significant predictive value for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. Additionally, a minimal gene set of 42 genes with a comparable predictive value to the 116-gene set in both our dataset and the previously published literature cohorts for the comparsion of TB vs. LTBI vs. HC was identified. In order to identify a robust representative gene set that would hold stand among different ethnic populations, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these ten genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to diagnose 78% of the TB cases correctly with 100% specificity. We conclude that our data justify the further exploration of our signature set as biomarkers to diagnose childhood TB. Furthermore, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts.