VisHiC  is a web based tool for performing the hierarchical clustering of gene expression data followed by automatic functional enrichment analysis of clusters derived. The unique feature of VisHiC is the global enrichment analysis of every possible cluster for shared biological function and a compact global visualization that highlights major gene clusters that are co-expressed and statistically significantly enriched in biological terms. VisHiC utilises Gene Ontology, well curated pathway databases (KEGG, Reactome), regulatory motifs of Transfac and microRNA target sites of miRBase, CORUM protein complexes and Human Protein Atlas (HPA) and Human Phenotype Ontology (HPO) to provide information on the shared regulative mechanisms for given genes. All this information is given in an easily comprehendable birds-eye view that in color-coded form outlines interesting groups of genes.
Datasets pass several stages of analysis:
Key feature of VisHiC is search and visualization of clusters that are significantly enriched with biological terms. See section About input for more details.
VisHiC supports gene expression datasets from all major organisms that come in standardized tab-separated form. Gene expression datasets can be uploaded by the user using simple web upload form and the results will be linked with given e-mail address. We also provide a variety of public datasets. VisHiC supports hundreds of types of gene identifiers to allow user to input the data with favourite gene names or database IDs. The identifiers coincide with the identifiers available in public web server g:Profiler.
 VisHiC was first published in 2009 by Krushevskaya et al. (PDF)
In this section we cover performing the very first analysis and visualization with VisHiC. The easiest way to learn how to use VisHiC is to analyse one of the publicly available datasets. This procedure requires only two easy stages of input:
In order to make things even easier, right on the welcome page of the application we provide a sample query. Just click on it and push Start the analysis button down below. You will be redirected to the analysis page.
Take your time and walk around the result page, mouse over different parts of images and you see interactive information and links. To learn more about the results and options, you can take a look at About output section, get familiar with About input. If you have any problems or questions we will be more than happy to assist you.
To analyse a dataset using VisHiC user must provide gene expression dataset and indicate the correct organism during upload process. The rest of the care will be taken by VisHiC application.
The work of VisHiC starts with preprocessing the dataset. During this process the dataset is hierarchically clustered using Hybrid Hierarchical Clustering. Pearson similarity measure and Euclidean distance are used to measure the similarity between elements. This allows the user to select the measure of their preference. As a next step, each cluster from the resulted hierarchy is annotated using g:Profiler. Annotations are performed so that user can later select a multiple testing correction to reduce the amount of false positives resulting from numerous enrichment tests. The special correction that takes into account the hierarchical structure of GO is selected by default (defined as g:SCS method in g:Profiler), but it is also possible to apply standard methods like Bonferroni correction and FDR Benjamini-Hochberg correction.
The second stage of analysis is performed when user chooses to analyse and visualize some particular dataset.
User has 3 options to cut the hierarchical tree.
The list of interesting clusters is defined based on the best annotations of the clusters. In other words, each cluster is characterized by one (best) annotation according to the p-value.
The best annotation cutting strategy is performed at two stages:
The list of interesting clusters is based on the accumulative scores of the clusters. A characteristic, that represents the average goodness of annotations, is computed for each cluster.
The annotation score cutting strategy is also performed at two stages:
The first annotations cutting strategy is defined based on the clustering hierarchy.
In addition to the cutting strategy, it is also possible to set:
VisHiC output consists of several pages:
This page contains 3 main parts: interactive view that represents the dataset, list of interesting clusters with list of cluster best annotations regarding the domain and p-value of annotation (summary page), list of unique statistically significant annotations. Unique annotations are annotations that are present in only one dense cluster of the given dataset. In addition, we provide the list of genes in the dataset with descriptions.
There are two parts in the interactive view - expression profiles (heatmap)  and a hierarchical clustering tree (dendrogram) .
Expression profiles are visualized using blue-white-red color gradient . It is eye-friendly and intuitively understandable: blue color denotes genes with low expression value and red color shows highly-expressed genes. Rows of the heatmap represent genes and columns stand for states of biological conditions (samples). If present, the sample annotations are shown above the heatmap columns .
The tree depicts hierarchical clustering of gene expression data with individual elements at one end and single cluster containing all elements at the other. Each node of the tree represents a cluster, the distance represents the similarity of elements in the cluster: the smaller the distance is the more similar elements are. The scale of the distance is explained on x-axis . The similarity is measured using either Pearson correlation or Euclidean distance and is scaled to range [0,1].
However, the picture is slightly different from the usual one. While cutting the tree, VisHiC searches interesting clusters: clusters that meet the requirements of cutting strategy and contain statistically significant annotation. These clusters are denoted by colored rectangles. The size of the rectangle communicates the size of underlying cluster, the number of genes in the cluster. The colors of the cluster rectangle code the annotations found for the cluster. The sizes of inner rectangles reflect the proportional distribution of cluster annotations by domain.
Next to the cluster rectangle you can see a grey bar that denotes the proportion of genes that are annotated from any of the statistically significant annotations of that cluster .
The user can also choose to omit the sparse clusters from the final output. In this case the location of sparse clusters is presented with an empty branch .
Search form enables to search for the location of a gene or annotation term from the interesting clusters if corresponding filter is selected. Searching for multiple genes/terms is allowed by separating the queries by semicolon and space ("; "). The form also helps the user by suggesting keywords while typing, but feel free to search for your own gene/term IDs. The resulting clusters are highlighted in the dendrogram.
Cluster view page contains the information about one particular cluster selected. The result is also interactive.
The main parts of the page:
In this section we describe how to analyse and visualize your own dataset.
You can upload your data in the Data upload section. Fill in the fields of the form and follow the instructions. Note that the name of your uploaded file is used as the name of the dataset in further analysis. It is very important to explicitly define the organism of the dataset.
To make a dataset available for analysis we need to preprocess it. This step is required due to the size of experimental datasets - clustering and annotation can take a while (for example dataset of 15 conditions and 32 000 genes takes approximately 1 hour). During the preprocessing stage, VisHiC calculates hierarchical clustering and annotates all the clusters (there can be as many as 2 to the power of n-1 clusters, where n is the number of elements in the dataset) for a dataset. We will take care of the preprocessing of your data ourselves and if this is finished you will get an e-mail with an access link to the results. Preprocessing is performed in a way that later you can apply the same parameters to your dataset as in case of our public datasets.
When the uploading and preprocessing of the dataset is done, you will get an e-mail with a link to your results to the contact e-mail noted in the upload form. In this link you will also see all your previously uploaded and successfully preprocessed datasets.
The speed of the analysis is highly dependent on the size of the dataset. By applying the additional threshold or selecting less annotation types you will reduce the number of computations and speed up the process. You can also increase the minimum size of the potential interesting clusters and decrease the maximum.
You can apply additional threshold for annotations, that will guarantee that all the annotations considered for cutting of the tree using best annotation strategy are highly statistically significant. Allow smaller clusters (from 5 to 100) to be found.
Additional threshold for the annotations found will most probably reduce the size of the picture. Similarly effect is on selecting less annotation types.