g:Profiler client libraries

R client

g:Profiler has an up-to-date R client library gprofiler2 available from CRAN or conda-forge. For more documentation see the help page

Python client

g:Profiler has an up-to-date python client library gprofiler-official available from PyPI or conda-forge. For more documentation see the package description

g:Profiler API

g:Profiler requests are generally made as POST requests with a JSON body and they return JSON output.

g:GOSt

URL: /gprofiler_beta/api/gost/profile/
METHOD: POST
PARAMETERS:
organism
ID of species to be queried. List of possible ID-s can be seen at the organisms list page.
organism:"hsapiens"
query
List of genes to be queried. Can be a list of strings or a dictionary of lists if multiple queries are submitted simultaneously.
query:["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]
query:{
                first_query:["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
                second_query:["MLXIPL","SMARCB1","PIH1D1","SMARCA4","AGER"]
                }
sources
List of datasources to use for query. NB! Not all sources are available for all species.
source ID Source name
GO:MF molecular function
GO:CC cellular component
GO:BP biological process
KEGG Kyoto Encyclopedia of Genes and Genomes
REAC Reactome
WP WikiPathways
TF Transfac
MIRNA miRTarBase
HPA Human Protein Atlas
CORUM CORUM protein complexes
HP Human Phenotype Ontology
Empty list is equivalent to full list:
sources:[]
sources:["GO:MF","GO:CC","GO:BP","KEGG","REAC","WP","TF","MIRNA","HPA","CORUM","HP"]
user_threshold
float between 0 and 1, used to define custom significance threshold.
user_threshold:0.0001
user_threshold:1e-8
all_results
Boolean. Default false. If "true", the API also returns results that are below the significance threshold.
ordered
Boolean. Default false. If true, the API performs ordered query. Read more.
combined
Boolean. Default false. If true, runs queries simultaneously and combines the result. See query for how to supply more than one query in request. Read more.
measure_underrepresentation
Boolean, default false. If true, g:GOSt returns significantly under-represented functional terms. Read more.
no_iea
Boolean, default false. If true, g:GOSt excludes electronic annotations from GO terms. Read more.
domain_scope
String, default 'annotated'. Other options 'known', 'custom', 'custom_annotated'. If 'custom' or 'custom_known', the 'background' parameter must be populated. 'custom' enforces the option "Custom over all known genes" and 'custom_annotated' enforces the option "Custom over annotated genes". Read more.
numeric_ns
String. Default "ENTREZGENE". Indicating which namespace to use when IDs are numeric. Read more.
significance_threshold_method
String. Multiple testing correction method. Default 'g_SCS'. Other options are 'bonferroni' and 'fdr' Read more.
background
List of strings. Should be a list of gene-ids (preferably ensembl id-s) to be considered as the statistical background for the query. To use this parameter, 'domain_scope' should be set to custom. TODO: check if true. Read more.
output
String. Default "json". No other options should be used at the moment.
no_evidences
Boolean. Default false. If true, skips lookup for evidence codes. Speeds up queries, if there is no interest in evidence codes.
highlight
Optional boolean. Default false. Adds the "highlighted" column to results if set to value "true". For more info, see the documentation.

g:GOSt query result fields

These are the result fields for most simple queries.
name
Term name.
description
Term description if available. If not available, repeats the term name.
native
Term ID in its native namespace. For non-GO terms, the ID is prefixed with the datasource abbreviation.
parents
List of native IDs that are hierarchically above the term. For non-hierarchical datasources, points to artificial root node if applicable.
p_value
Hypergeometric p-value after correction for multiple testing.
goshv
Internal g:Profiler numeric ID. Unique for the term. Not consistent across data updates.
significant
Indicator for statistically significant results.
effective_domain_size
The total number of genes "in the universe " which is used as one of the four parameters for the hypergeometric probability function of statistical significance.
intersection_size
The number of genes in the query that are annotated to the corresponding term.
term_size
The number of genes that are annotated to the term.
query_size
The number of genes that were included in the query. This might be different from the size of the original query list if:
  1. any terms in the original query list were mapped to multiple Ensembl gene-IDs;
  2. any terms in the original query list failed to be mapped to any Ensembl gene-ID;
  3. an ordered query was performed and the optimal cutoff point for the term was found before the end of the query;
  4. the query was made with "domain_scope" set to "annotated" or "custom" - then the original query gets intersected with either the gene set annotated to the current datasource or the custom background provided.
precision
The proportion of genes in the input list that are annotated to the function.
Defined as intersection_size/query_size.
recall
The proportion of functionally annotated genes that the query recovers.
Defined as intersection_size/term_size.
intersections
List of lists of strings. The elements in this list correspond to query genes and are in the same order as the genes in "genes_metadata" -> "query" -> `query_name` -> "ensgs". (note that the query_name is variable)
The gene ID-s in the "ensgs" list are in the same order as in the "intersections" structure. Empty lists mean "no intersection", lists with elements mean "this gene is part of the intersection between the term and query" and `null` values mean that the gene wasn't looked at for the response (possible if making ordered queries).
source
The abbreviation of the datasource for the term. Currently, the possible datasources are
  • GO:MF - Gene Ontology Molecular Function branch
  • GO:BP - Gene Ontology Biological Process branch
  • GO:CC - Gene Ontology Cellular Component branch
  • KEGG - KEGG pathways
  • REAC - Reactome pathways
  • WP - WikiPathways
  • TF - Transfac transcription factor binding site predictions
  • MIRNA - mirTarBase miRNA targets
  • HPA - Human Protein Atlas expression data
  • CORUM - Manually annotated protein complexes from mammalian organisms.
  • HP - Human Phenotype Ontology, a standardized vocabulary of phenotypic abnormalities encountered in human disease.
query
The name of the input query which by default is the order of query with the prefix "query_" (e.g query_1, query_2). If set by the user, then the value is the user-defined name for the query.
source_order
The numeric order for the term within its datasource. Important for drawing reproducible manhattan plots across different platforms.
group_id
The identifier that defines the group of terms that are connected by some previously known relations. For example, this as an identifier for different Gene Ontology subgraphs that can be formed from the results based on the connections defined in GO.

Simple python example

import requests
r = requests.post(
    url='/gprofiler_beta/api/gost/profile/',
    json={
        'organism':'hsapiens',
        'query':["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
    }
    )
r.json()['result']

Simple CURL example

curl -X 'POST' -d '{"organism": "hsapiens", "query": ["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]}' '/gprofiler_beta/api/gost/profile/'

Python example with more parameters set to non-default values

import requests
r = requests.post(
    url='/gprofiler_beta/api/gost/profile/',
    json={
    'organism':'hsapiens',
    'query':["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
    'sources' :['GO'], #only look into Gene Ontology terms.
    'user_threshold':1e-8, #reduce the significance threshold,
    'significance_threshold_method':'bonferroni', #use bonferroni correction instrad of the default 'g_SCS'.
    'no_evidences':True, #skip lookup for evidence codes. Speeds up queries, if there is no interest in evidence codes.
    'no_iea':True, #Ignore electonically annotated GO annotations

    'domain_scope':'custom',#use the genes in the probe as the statistical background.
    'background':'AFFY_HG_U133A'
    },
    headers={
    'User-Agent':'FullPythonRequest'
    }
)
r.json()['result']

g:Convert

URL: /gprofiler_beta/api/convert/convert/
METHOD: POST
PARAMETERS:
organism
String. ID of species to be queried. List of possible ID-s can be seen at the organisms list page.
organism:"hsapiens"
query
String or list of strings
query:["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]
query:"CASQ2 CASQ1 GSTO1 DMD GSTM2"
target
String. Namespace to convert to.
target:"UCSC"
numeric_ns
String. Default ""ENTREZGENE_ACC"". Indicating which namespace to use when input IDs are numeric. Read more.
output
String. Default "json". No other options should be used at the moment.

Simple python example

import requests
r = requests.post(
    url='/gprofiler_beta/api/convert/convert/',
    json={
        'organism':'hsapiens',
        'target':'UCSC',
        'query':["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
    }
    )
r.json()['result']

Simple CURL example

curl -X 'POST' -d '{"organism": "hsapiens", "target": "UCSC", "query": ["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]}' '/gprofiler_beta/api/convert/convert/'

g:Orth

URL: /gprofiler_beta/api/orth/orth/
METHOD: POST
PARAMETERS:
organism
String. ID of species to be queried. List of possible ID-s can be seen at the organisms list page.
organism:"hsapiens"
query
String or list of strings
query:["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]
query:"CASQ2 CASQ1 GSTO1 DMD GSTM2"
target
String. Organism ID for othology targets.
target:"mmusculus"
numeric_ns
String. Default "ENTREZGENE_ACC". Indicating which namespace to use when input IDs are numeric. Read more.
output
String. Default "json". No other options should be used at the moment.

Simple python example

import requests
r = requests.post(
    url='/gprofiler_beta/api/orth/orth/',
    json={
        'organism':'hsapiens',
        'target':'mmusculus',
        'query':["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"],
    }
    )
r.json()['result']

Simple CURL example

curl -X 'POST' -d '{"organism": "hsapiens", "target": "mmusculus", "query": ["CASQ2", "CASQ1", "GSTO1", "DMD", "GSTM2"]}' '/gprofiler_beta/api/orth/orth/'

g:SNPense

URL: /gprofiler_beta/api/snpense/snpense/
METHOD: POST
PARAMETERS:
query
String or list of strings.
query:["rs11734132", "rs7961894", "rs4305276", "rs17396340"]
query:"rs11734132 rs7961894 rs4305276 rs17396340"

Simple python example

import requests
r = requests.post(
    url='/gprofiler_beta/api/snpense/snpense/',
    json={
        'organism':'hsapiens',
        'query':['rs11734132', 'rs7961894', 'rs4305276'],
    }
    )
r.json()['result']

Simple CURL example

curl -X 'POST' -d '{"query": ["rs11734132", "rs7961894", "rs4305276"]}' '/gprofiler_beta/api/snpense/snpense/'

Organisms list

This is a utility method for discovering the species available from g:Profiler along with their associated namespaces.
URL: /gprofiler_beta/api/util/organisms_list/
METHOD: GET
PARAMETERS:
organism
String. ID of a particular species. The default is no ID - in that case the endpoint returns data for all available species.
organism=hsapiens
extra_data
Boolean. Whether to include lists of available namespaces for organisms.
extra_data=True

Simple CURL example

curl  '/gprofiler_beta/api/util/organisms_list'

CURL example for a specific organism with list of namespaces

curl  '/gprofiler_beta/api/util/organisms_list?organism=hsapiens&extra_data=True'

CURL example with json parameters

curl  -H "Content-type: application/json" -H "Accept: application/json" -X 'GET' -d '{"organism":"hsapiens", "extra_data":true}'  '/gprofiler_beta/api/util/organisms_list'

Data versions

This is a utility method for discovering the data versions of data sources are available for g:GOSt.
URL: /gprofiler_beta/api/util/data_versions/
METHOD: GET
PARAMETERS:
organism
String. ID of a particular species.
organism=hsapiens

CURL example

curl  '/gprofiler_beta/api/util/data_versions?organism=hsapiens'

CURL example with json parameters

curl  -H "Content-type: application/json" -H "Accept: application/json" -X 'GET' -d '{"organism":"hsapiens"}'  '/gprofiler_beta/api/util/data_versions'
Contents