g:Profiler – a web server for functional enrichment analysis and conversions of gene lists

g:GOSt
Functional profiling g:Convert
Gene ID conversion g:Orth
Orthology search g:SNPense
SNP id to gene name

g:Profiler client libraries

R client

g:Profiler has an up-to-date R client library gprofiler2 available from CRAN or conda-forge. For more documentation see the help page

Python client

g:Profiler has an up-to-date python client library gprofiler-official available from PyPI or conda-forge. For more documentation see the package description

g:Profiler API

g:Profiler requests are generally made as POST requests with a JSON body and they return JSON output.

g:GOSt

URL: /gprofiler_beta/api/gost/profile/
METHOD: POST
PARAMETERS:

organism

ID of species to be queried. List of possible ID-s can be seen at the organisms list page.

organism:"hsapiens"

query

List of genes to be queried. Can be a list of strings or a dictionary of lists if multiple queries are submitted simultaneously.

query:["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]

query:{
                first_query:["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
                second_query:["MLXIPL","SMARCB1","PIH1D1","SMARCA4","AGER"]
                }

sources

List of datasources to use for query. NB! Not all sources are available for all species.

source ID	Source name
GO:MF	molecular function
GO:CC	cellular component
GO:BP	biological process
KEGG	Kyoto Encyclopedia of Genes and Genomes
REAC	Reactome
WP	WikiPathways
TF	Transfac
MIRNA	miRTarBase
HPA	Human Protein Atlas
CORUM	CORUM protein complexes
HP	Human Phenotype Ontology

Empty list is equivalent to full list:

sources:[]

sources:["GO:MF","GO:CC","GO:BP","KEGG","REAC","WP","TF","MIRNA","HPA","CORUM","HP"]

user_threshold

float between 0 and 1, used to define custom significance threshold.

user_threshold:0.0001

user_threshold:1e-8

all_results

Boolean. Default false. If "true", the API also returns results that are below the significance threshold.

ordered

Boolean. Default false. If true, the API performs ordered query. Read more.

combined

Boolean. Default false. If true, runs queries simultaneously and combines the result. See query for how to supply more than one query in request. Read more.

measure_underrepresentation

Boolean, default false. If true, g:GOSt returns significantly under-represented functional terms. Read more.

no_iea

Boolean, default false. If true, g:GOSt excludes electronic annotations from GO terms. Read more.

domain_scope

String, default 'annotated'. Other options 'known', 'custom', 'custom_annotated'. If 'custom' or 'custom_known', the 'background' parameter must be populated. 'custom' enforces the option "Custom over all known genes" and 'custom_annotated' enforces the option "Custom over annotated genes". Read more.

numeric_ns

String. Default "ENTREZGENE". Indicating which namespace to use when IDs are numeric. Read more.

significance_threshold_method

String. Multiple testing correction method. Default 'g_SCS'. Other options are 'bonferroni' and 'fdr' Read more.

background

List of strings. Should be a list of gene-ids (preferably ensembl id-s) to be considered as the statistical background for the query. To use this parameter, 'domain_scope' should be set to custom. TODO: check if true. Read more.

output

String. Default "json". No other options should be used at the moment.

no_evidences

Boolean. Default false. If true, skips lookup for evidence codes. Speeds up queries, if there is no interest in evidence codes.

highlight

Optional boolean. Default false. Adds the "highlighted" column to results if set to value "true". For more info, see the documentation.

g:GOSt query result fields

These are the result fields for most simple queries.

name

Term name.

description

Term description if available. If not available, repeats the term name.

native

Term ID in its native namespace. For non-GO terms, the ID is prefixed with the datasource abbreviation.

parents

List of native IDs that are hierarchically above the term. For non-hierarchical datasources, points to artificial root node if applicable.

p_value

Hypergeometric p-value after correction for multiple testing.

goshv

Internal g:Profiler numeric ID. Unique for the term. Not consistent across data updates.

significant

Indicator for statistically significant results.

effective_domain_size

The total number of genes "in the universe " which is used as one of the four parameters for the hypergeometric probability function of statistical significance.

intersection_size

The number of genes in the query that are annotated to the corresponding term.

term_size

The number of genes that are annotated to the term.

query_size

The number of genes that were included in the query. This might be different from the size of the original query list if:

any terms in the original query list were mapped to multiple Ensembl gene-IDs;
any terms in the original query list failed to be mapped to any Ensembl gene-ID;
an ordered query was performed and the optimal cutoff point for the term was found before the end of the query;
the query was made with "domain_scope" set to "annotated" or "custom" - then the original query gets intersected with either the gene set annotated to the current datasource or the custom background provided.

precision

The proportion of genes in the input list that are annotated to the function.

Defined as intersection_size/query_size.

recall

The proportion of functionally annotated genes that the query recovers.

Defined as intersection_size/term_size.

intersections

List of lists of strings. The elements in this list correspond to query genes and are in the same order as the genes in "genes_metadata" -> "query" -> `query_name` -> "ensgs". (note that the query_name is variable)

The gene ID-s in the "ensgs" list are in the same order as in the "intersections" structure. Empty lists mean "no intersection", lists with elements mean "this gene is part of the intersection between the term and query" and `null` values mean that the gene wasn't looked at for the response (possible if making ordered queries).

source

The abbreviation of the datasource for the term. Currently, the possible datasources are

GO:MF - Gene Ontology Molecular Function branch
GO:BP - Gene Ontology Biological Process branch
GO:CC - Gene Ontology Cellular Component branch
KEGG - KEGG pathways
REAC - Reactome pathways
WP - WikiPathways
TF - Transfac transcription factor binding site predictions
MIRNA - mirTarBase miRNA targets
HPA - Human Protein Atlas expression data
CORUM - Manually annotated protein complexes from mammalian organisms.
HP - Human Phenotype Ontology, a standardized vocabulary of phenotypic abnormalities encountered in human disease.

query

The name of the input query which by default is the order of query with the prefix "query_" (e.g query_1, query_2). If set by the user, then the value is the user-defined name for the query.

source_order

The numeric order for the term within its datasource. Important for drawing reproducible manhattan plots across different platforms.

group_id

The identifier that defines the group of terms that are connected by some previously known relations. For example, this as an identifier for different Gene Ontology subgraphs that can be formed from the results based on the connections defined in GO.

Simple python example

import requests
r = requests.post(
    url='/gprofiler_beta/api/gost/profile/',
    json={
        'organism':'hsapiens',
        'query':["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
    }
    )
r.json()['result']

Simple CURL example

curl -X 'POST' -d '{"organism": "hsapiens", "query": ["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]}' '/gprofiler_beta/api/gost/profile/'

Python example with more parameters set to non-default values

import requests
r = requests.post(
    url='/gprofiler_beta/api/gost/profile/',
    json={
    'organism':'hsapiens',
    'query':["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
    'sources' :['GO'], #only look into Gene Ontology terms.
    'user_threshold':1e-8, #reduce the significance threshold,
    'significance_threshold_method':'bonferroni', #use bonferroni correction instrad of the default 'g_SCS'.
    'no_evidences':True, #skip lookup for evidence codes. Speeds up queries, if there is no interest in evidence codes.
    'no_iea':True, #Ignore electonically annotated GO annotations

    'domain_scope':'custom',#use the genes in the probe as the statistical background.
    'background':'AFFY_HG_U133A'
    },
    headers={
    'User-Agent':'FullPythonRequest'
    }
)
r.json()['result']

g:Convert

URL: /gprofiler_beta/api/convert/convert/
METHOD: POST
PARAMETERS:

organism

String. ID of species to be queried. List of possible ID-s can be seen at the organisms list page.

organism:"hsapiens"

query

String or list of strings

query:["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]

query:"CASQ2 CASQ1 GSTO1 DMD GSTM2"

target

String. Namespace to convert to.

target:"UCSC"

numeric_ns

String. Default ""ENTREZGENE_ACC"". Indicating which namespace to use when input IDs are numeric. Read more.

output

String. Default "json". No other options should be used at the moment.

Simple python example

import requests
r = requests.post(
    url='/gprofiler_beta/api/convert/convert/',
    json={
        'organism':'hsapiens',
        'target':'UCSC',
        'query':["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
    }
    )
r.json()['result']

Simple CURL example

curl -X 'POST' -d '{"organism": "hsapiens", "target": "UCSC", "query": ["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]}' '/gprofiler_beta/api/convert/convert/'

g:Orth

URL: /gprofiler_beta/api/orth/orth/
METHOD: POST
PARAMETERS:

organism

String. ID of species to be queried. List of possible ID-s can be seen at the organisms list page.

organism:"hsapiens"

query

String or list of strings

query:["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]

query:"CASQ2 CASQ1 GSTO1 DMD GSTM2"

target

String. Organism ID for othology targets.

target:"mmusculus"

numeric_ns

String. Default "ENTREZGENE_ACC". Indicating which namespace to use when input IDs are numeric. Read more.

output

String. Default "json". No other options should be used at the moment.

Simple python example

import requests
r = requests.post(
    url='/gprofiler_beta/api/orth/orth/',
    json={
        'organism':'hsapiens',
        'target':'mmusculus',
        'query':["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
    }
    )
r.json()['result']

Simple CURL example

curl -X 'POST' -d '{"organism": "hsapiens", "target": "mmusculus", "query": ["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]}' '/gprofiler_beta/api/orth/orth/'

g:SNPense

URL: /gprofiler_beta/api/snpense/snpense/
METHOD: POST
PARAMETERS:

query

String or list of strings.

query:["rs11734132", "rs7961894", "rs4305276", "rs17396340"]

query:"rs11734132 rs7961894 rs4305276 rs17396340"

Simple python example

import requests
r = requests.post(
    url='/gprofiler_beta/api/snpense/snpense/',
    json={
        'organism':'hsapiens',
        'query':['rs11734132', 'rs7961894', 'rs4305276'],
    }
    )
r.json()['result']

Simple CURL example

curl -X 'POST' -d '{"query": ["rs11734132", "rs7961894", "rs4305276"]}' '/gprofiler_beta/api/snpense/snpense/'

Organisms list

This is a utility method for discovering the species available from g:Profiler along with their associated namespaces.
URL: /gprofiler_beta/api/util/organisms_list/
METHOD: GET
PARAMETERS:

organism

String. ID of a particular species. The default is no ID - in that case the endpoint returns data for all available species.

organism=hsapiens

extra_data

Boolean. Whether to include lists of available namespaces for organisms.

extra_data=True

Simple CURL example

curl  '/gprofiler_beta/api/util/organisms_list'

CURL example for a specific organism with list of namespaces

curl  '/gprofiler_beta/api/util/organisms_list?organism=hsapiens&extra_data=True'

CURL example with json parameters

curl  -H "Content-type: application/json" -H "Accept: application/json" -X 'GET' -d '{"organism":"hsapiens", "extra_data":true}'  '/gprofiler_beta/api/util/organisms_list'

Data versions

This is a utility method for discovering the data versions of data sources are available for g:GOSt.
URL: /gprofiler_beta/api/util/data_versions/
METHOD: GET
PARAMETERS:

organism

String. ID of a particular species.

organism=hsapiens

CURL example

curl  '/gprofiler_beta/api/util/data_versions?organism=hsapiens'

CURL example with json parameters

curl  -H "Content-type: application/json" -H "Accept: application/json" -X 'GET' -d '{"organism":"hsapiens"}'  '/gprofiler_beta/api/util/data_versions'

g:Profiler client libraries

R client

Python client

g:Profiler API

g:GOSt

g:GOSt query result fields

Simple python example

Simple CURL example

Python example with more parameters set to non-default values

g:Convert

Simple python example

Simple CURL example

g:Orth

Simple python example

Simple CURL example

g:SNPense

Simple python example

Simple CURL example

Organisms list

Simple CURL example

CURL example for a specific organism with list of namespaces

CURL example with json parameters

Data versions

CURL example

CURL example with json parameters

Contents