We currently include data and annotations for organisms that are included in any of these BioMarts:
To ensure quality and trustworthiness of our data, we have a policy of not including single species on a case-by-case basis. If you wish g:Profiler included a specific new species or update an existing one, please contact the appropriate BioMart. It's also possible to create custom organisms from external data.
It is possible to generate a custom annotation set for any organism if you have a set of annotations for it. We support uploading annotations as GMT files.
There are step-by-step instructions and a helper tool for all steps of the process available at https://biit.cs.ut.ee/gmt-helper/.
For example, you might be able to download the GO annotations for your species from https://www.ebi.ac.uk/QuickGO (it might take around 10 minutes or more to export the data) and the GO ontology file "go-basic.obo" from http://geneontology.org/docs/download-ontology/
Then use the "Convert tabular file to GMT" tool, followed by the "Reannotate GMT using OBO" tool. We recommend propagating the relations "is_a", "negatively_regulates", "part_of", "positively_regulates" and "regulates" as that is the set that g:Profiler uses.
The easiest solution in this case is to start using the dataset as custom GMT file. If you think that many other users would benefit from being able to make queries against this file, then contact us using the contact form or send us an email to biit.support@ut.ee. We would be happy to incorporate novel highly relevant datasets for a wider audience whenever the licenses would allow us to.
Unfortunately, we are restricted by data source licenses that do not allow us to share these two data sources with our users.
In g:Profiler we provide each data source as a separate GMT file and do not provide subset combinations as separate GMT files anymore. The easiest solution to compose a pathways.gmt file would be to concatenate the two files together into a single pathways file. In Mac/Linux the following command line example
cat hsapiens.GO\:BP.name.gmt hsapiens.REAC.name.gmt > hsapiens.pathways.name.gmt would do the trick. When using Windows, using a text editor and copy-pasting all the rows from REAC file together with all the rows in GO:BP file would give the same result.
Most often you need to check whether the selected organism matches your gene list. This can be done by checking information at the ‘Query info’ tab. There you can also find list of gene IDs that were recognised by g:Profiler and included in the query and a list of genes that were not recognised and not included into the query. It can be useful to understand whether none of your genes were recognised or just some of them.
Also, if your query list involves versioned gene identifiers then it might be better to remove the version information.
For a small query of not directly related genes, no terms might show up as significant. If you still want to explore to which of the terms the input genes belong, then please pick the All results option from advanced options section. In this way you can explore all the terms where at least one input gene belongs to.
Sometimes when using gene identifiers, then some of the gene names might be related to several gene IDs and just one alias could be related in Ensembl. Well known example is the Oct4 gene that is not recognized but its alias of Pou5f1 gives results.
It is important to remember that if you use custom GMT file as the data source then you can only query the same IDs as in the data file as g:Profiler is not capable of guessing what is the correct organism for your data. If you do not know the required gene identifiers by heart then use first g:Convert to get the identifiers at hand to the desired format used in the GMT file.
We get gene coordinates from Ensembl and use the minimum transcript start coordinate and maximum transcript end coordinate to define gene location. We match a gene if the input coordinate provided by the user overlaps with gene coordinates.
The genome version can be seen from “Show data versions” link under the “Data sources” option.
In order to see which genes belong to a particular term, you need to select “>>” above the p-value column from the “Detailed Results” tab. This opens additional information about the query and term overlap. The columns will give an overview of the term size (T), query size (Q), overlap (T∩Q), size of the gene universe (U). Clicking on the value in the overlap column (T∩Q) opens a new tab in g:Convert with all the gene IDs belonging to that particular term.
The same functionality is available
We omitted the hierarchical sorting on purpose from the g:Profiler as the function was only really applicable for Gene Ontology. And even for Gene Ontology the hierarchy is not really comparable between the subgraphs (CC, BP, MF). Instead to highlight the closely related terms in the results, we introduced a Manhattan plot. On the plot the hierarchically related terms are close to each other on y-axis and therefore should be more easily noticeable for the users.
Most of the data comes from Ensembl database and thus we follow ENSEMBL quarterly update cycle with a few weeks to months time lag (to verify that all the data works in g:Profiler). Other data sources (KEGG, TRANSFAC etc) are updated during the same update routine straight from the data sources. The current data versions can be seen from “Show data versions” link under the “Data sources” option.
We aim to keep g:Profiler up to date, and this applies also to all of our data sources. Over the time, the data sources we base our service on, inevitably changes. However, all previous data and accompanying user interface versions are preserved and archived. You can find previous g:Profiler releases from https://biit.cs.ut.ee/gprofiler/archives/. This allows users to reproduce their own or others' results in the context of previous releases. It is our aim to provide a service that is transparent and allows to conduct reproducible science.
Currently we do not provide separate instances, such as Docker images of g:Profiler service. If you would like to run thousands of queries or build your own service on g:Profiler then please do contact us via contact form or by email to biit.support@ut.ee and we try to find an optimal solution to allow heavy usage of our service for you.
Yes, we provide multiple APIs for programmatic access. Documentation is available from apis page . We encourage users to include g:Profiler into their pipelines using APIs that we provide. We would appreciate if you let us know if you are building an external service on ours by contacting us at biit.support@ut.ee. Also signing up for the announcements mailinglist at https://lists.ut.ee/wws/subscribe/gprofiler.announcements might be good as we announce the data releases and changes in our services there. All the services using g:Profiler that we are aware of are listed on a dedicated page.
Yes, you can. Please sign up for our announcements mailing list at https://lists.ut.ee/wws/subscribe/gprofiler.announcements to receive information about new data releases and changes in our services.
Our new R package is in CRAN and the same functionality as the web server offers is available also for our R users.
We value user confidentiality and believe that it is up to the users to choose with whom and where to share their research findings. Therefore we do not store user "Query" nor "Background" gene lists by default. Once the results are displayed to the user, we remove the query from memory. The only exception is when the user chooses to share the results via short link or query url then we need to store the query to reproduce the results in the future.
However, in order to offer better service for our customers and impress our funders with high-level statistics, we save the parameters used by our users. This means that we keep data sources (GO, REACTOME etc) along with the options used (ordered list, significance threshold method etc) to monitor what are the popular and not so popular options and data sources that our customers use.
g:Profiler is open for all users, both academic and non-academic free of charge. We appreciate if you share your excitement about our service with your colleagues and cite us when you have found our service useful.