m:Explorer is a generic computational method for identifying process-specific gene regulators from high-throughput genomic data. It applies multinomial logistic regression models to select regulators whose target genes are highly predictive of process-related gene function. Target genes may be defined from heterogeneous data sources, and multiple process sub-classes are allowed.
This website serves three primary purposes. First, we provide
source code of m:Explorer as an R package. Second, we provide a comprehensive transcription factor (TF)
dataset that covers most TFs in budding yeast
S.cerevisiae. Third, we provide an online m:Explorer
web-service that applies the above dataset for predicting process-specific TFs in yeast.
Some m:Explorer TF predictions in yeast have been validated experimentally. Further details can be found in our paper in
Genome Biology:
Jüri Reimand, Anu Aun, Jaak Vilo, Juan M. Vaquerizas, Juhan Sedman, Nicholas M. Luscombe :
m:Explorer - multinomial regression models reveal positive and negative regulators of longevity in yeast quiescence (2012)
Genome Biol 13:R55; doi: 10.1186/gb-2012-13-6-r55. [
PDF]
Example queries
- [ Example I ] - 396 periodically expressed yeast genes, labeled according to cell cycle phase in which these are expressed (G1,S,G2,M; de Lichtenberg, 2005). This gene list is compared with a TF target dataset of 9 cell cycle TFs and 6 related factors.
- [ Example II ] - 99 yeast genes expressed during the sythesis phase (S) of the cell cycle, no phase labels specified (de Lichtenberg, 2005). This gene list is compared with a TF target dataset of 9 cell cycle TFs and 6 related factors.
- [ Example III ] - Same as Example I, except that the full dataset of 285 TFs is used(slower).
Yeast TF dataset
The yeast dataset applied here includes genome-wide targets for 285 yeast TFs, using three types of evidence: (i) differentially expressed genes from TF perturbation experiments on microarrays, (ii) TF binding sites in gene promoters from ChIP-chip and PBM experiments and computational predictions, and (iii) nucleosome positioning measurements in TFBS loci. All measurements are discretised using cut-offs of statistical significance, and grouped into categories like
'upregulated',
'nucleosome-depleted binding-site', or
'no significant signal'. Two versions of nucleosome positioning are available -- measured in rich medium (YPD) and in non-optimal medium (ethanol). We also provide subsets of these data where some sources of evidence have been excluded.
More information and dataset downloads can be found
here.
m:Explorer method
1. Let
P be a process profile and
T_1..T_j be regulator profiles of TFs.
P classifies process-specific genes, while
T_i contain regulatory data for TFs and their targets.
2. Fit an intercept-only multinomial generalised linear regression model
M_0='P~1' to represent uniform distribution of process-specific genes in
P (the null model).
3. Fit single predictor model
M_i='P~T_i' for a TF, to represent TF-dependent distribution of the genes in
P (the alternative model).
4. To assess the significance of TF profile
T_i in explaining
P, compare null
M_0 and alternative
M_i models using a log-likelihood test with deviance and chi-square distribution.
5. Repeat step 4 for all TFs. Correct resulting p-values for multiple testing.
R package with source code can be downloaded
here.