g:SCS algorithmg:SCS method
is the default method for computing multiple testing correction for p-values gained from GO and pathway enrichment analysis. It corresponds to an experiment-wide threshold of a=0.05
, i.e. at least 95%
of matches above threshold are statistically significant.
This approach is based on the idea that standard multiple testing corrections such as Bonferroni correction
and Benjamini-Hochberg False Discovery rate
are designed for multiple tests that are independent of each other. This is not correct for the analysis in g:GOSt, since GO consists of hierarchically related general and specific terms. The True Path Rule
of GO states that genes associated to a given go term t
are implicitly associated to all more general parents of term t
g:SCS threshold is a value pre-calculated for query list sizes up to 1000 genes. Given a fixed input query size, g:SCS analytically approximates a threshold t
corresponding to the 5% upper quantile of randomly generated queries of that size. All actual p-values resulting from the query are transformed to corrected p-values by multiplying these to the ratio of the approximate threshold t
and the initial experiment-wide threshold a=0.05
The algorithm considers the set structure underlying gene sets annotated to terms of each organism, and should therefore give a tighter threshold to significant results. g:SCS thresholds perfectly agreed in simulations with randomly generated gene sets of fixed input query sizes.