m:Explorer is a generic computational method for identifying process-specific gene regulators from high-throughput genomic data. It applies multinomial logistic regression models to select regulators whose target genes are highly predictive of process-related gene function. Target genes may be defined from heterogeneous data sources, and multiple process sub-classes are allowed.
This website serves three primary purposes. First, we provide source code
of m:Explorer as an R package. Second, we provide a comprehensive transcription factor (TF) dataset
that covers most TFs in budding yeast S.cerevisiae
. Third, we provide an online m:Explorer web-service
that applies the above dataset for predicting process-specific TFs in yeast.
Some m:Explorer TF predictions in yeast have been validated experimentally. Further details can be found in our paper in Genome Biology
Jüri Reimand, Anu Aun, Jaak Vilo, Juan M. Vaquerizas, Juhan Sedman, Nicholas M. Luscombe : m:Explorer - multinomial regression models reveal positive and negative regulators of longevity in yeast quiescence
(2012) Genome Biol
13:R55; doi: 10.1186/gb-2012-13-6-r55. [PDF
Yeast TF dataset
- [ Example I ] - 396 periodically expressed yeast genes, labeled according to cell cycle phase in which these are expressed (G1,S,G2,M; de Lichtenberg, 2005). This gene list is compared with a TF target dataset of 9 cell cycle TFs and 6 related factors.
- [ Example II ] - 99 yeast genes expressed during the sythesis phase (S) of the cell cycle, no phase labels specified (de Lichtenberg, 2005). This gene list is compared with a TF target dataset of 9 cell cycle TFs and 6 related factors.
- [ Example III ] - Same as Example I, except that the full dataset of 285 TFs is used(slower).
The yeast dataset applied here includes genome-wide targets for 285 yeast TFs, using three types of evidence: (i) differentially expressed genes from TF perturbation experiments on microarrays, (ii) TF binding sites in gene promoters from ChIP-chip and PBM experiments and computational predictions, and (iii) nucleosome positioning measurements in TFBS loci. All measurements are discretised using cut-offs of statistical significance, and grouped into categories like 'upregulated'
, 'nucleosome-depleted binding-site'
, or 'no significant signal'
. Two versions of nucleosome positioning are available -- measured in rich medium (YPD) and in non-optimal medium (ethanol). We also provide subsets of these data where some sources of evidence have been excluded.
More information and dataset downloads can be found here
1. Let P
be a process profile and T_1..T_j
be regulator profiles of TFs. P
classifies process-specific genes, while T_i
contain regulatory data for TFs and their targets.
2. Fit an intercept-only multinomial generalised linear regression model M_0='P~1'
to represent uniform distribution of process-specific genes in P
(the null model).
3. Fit single predictor model M_i='P~T_i'
for a TF, to represent TF-dependent distribution of the genes in P
(the alternative model).
4. To assess the significance of TF profile T_i
in explaining P
, compare null M_0
and alternative M_i
models using a log-likelihood test with deviance and chi-square distribution.
5. Repeat step 4 for all TFs. Correct resulting p-values for multiple testing.
R package with source code can be downloaded here