GraphWeb logo
Front page | Contact us | Recent updates
Help | Overview | Example inputs


GraphWeb is a public web server for graph-based analysis of biological networks that:
  • analyses directed and undirected, weighted and unweighted heterogeneous networks of genes, proteins and microarray probesets for many eukaryotic genomes;
  • integrates multiple diverse datasets into global networks;
  • incorporates multispecies data using gene orthology mapping;
  • filters nodes and edges based on dataset support, edge weight and node annotation;
  • detects gene modules from networks using a collection of algorithms;
  • interprets discovered modules using Gene Ontology, pathways, and cis-regulatory motifs.
J. Reimand*, L. Tooming*, H. Peterson, P. Adler, J. Vilo: GraphWeb: mining heterogeneous biological networks for gene modules with functional significance. (2008)
Nucl. Acids Res. 2008 36: W452-W459; doi:10.1093/nar/gkn230 [ PDF ]
GraphWeb help

Introduction | Example | Suggestions for browsing | About this document

Methods of input | Example inputs | Manage data uploads with g:PEDaM

Input syntax | Edge weights explained | Gene ID conversion and orthologs

Algorithms | MCL | Betweenness centrality clustering | Connected components | Strongly connected components | Biconnected components | Maximal cliques | Hub based modules | Whole graph | Network neighbourhood

Options and settings | Assign more weight to smaller networks | Create global network | Remove unknown genes | Remove ambiguous genes | Keep N% of highest degree nodes | Remove edges with less than N labels | Keep N% of heaviest edges | Module filtering | Hide modules with less than N nodes | Show N largest modules | Calculate g:Profiler scores | Sort modules by functional score | Hide insignificant modules

Output | Node name conversion | Comp | Number of nodes | List nodes | List edges | Zoom in | Label distribution | Score | g:Profiler annotations | Visualisations | Summary row | g:Cocoa | Searching the output

Known problems

Introduction

This application enables you to input graphs and explore their structure, extracting potentially interesting subgraphs (modules) from them with various algorithms. GraphWeb is especially suited for analysing biological (such as protein-protein interaction or gene regulation) networks. In such a case, the nodes of the graphs are expected to represent genes or proteins, so the resulting modules can be sent as a query to g:Profiler, a tool that finds significant common properties of a group of genes and visualises the results. The edges of the graphs can be directed or undirected, edge weights are also supported. Furthermore, it is possible to compare and combine data coming from different sources by using the mechanism of edge labels. Each edge in the graph can have one or more labels, and a separate edge weight can be associated with each label. It is also possible to assign a default weight to each label, which will be given to edges with that label with no weight specified. Also, edges having a greater number of labels are given greater weights (by adding together their weights for all labels).

GraphWeb provides a simple syntax for representing edges (relationships between pairs of nodes), edge directions, weights and label information. After submitting the input, the program first converts gene IDs into a standard form, which allows easily combining data that uses different IDs. If you wish, genes of one species may be mapped into their orthologs in another species. After that, weights are assigned to all labels and then to edges. Then, optionally, filtering takes place, where low-weight edges and low-degree nodes are removed, based on user-specified tresholds. You can also specify a set of nodes whose neighbourhood will be extracted from the graph.

After the steps of reading the input and possibly filtering the resulting graph, we now have a final graph, which is then processed by a chosen algorithm. The algorithm returns a collection of modules. Each module is a group of nodes of the graph. The exact nature of the modules depends on the algorithm: they may be connected components, clusters, cliques, consist of neighbours of a central hub and so on. Each module is displayed on a separate line in the output table. Again, it is possible to filter some modules out (smaller ones, or those with no biological relevance detected). For each module, you can see information about the subgraph induced by the nodes of the module, and find out about the biological relevance of the module.

Example

To get a quick introduction to GraphWeb, you can try the following simple example. Copy the following input into the big textbox at the left side of the form. Note that each line describes an edge of the graph, listed as a pair of nodes.

A B
A C
B C
E F
G H
E H
F G
C D
E G
B K
T1 T2

Since we're not dealing with biological data in this example, you must turn off the checkbox "Hide insignificant modules" (default is on), otherwise the results will be discarded as biologically irrelevant. Choose the "connected components" algorithm and click the submit button. This is a good algorithm to begin analysing a graph with - it simply returns the separate connected parts of a graph.

In the output, you will see two modules displayed, which correspond to the connected components {A,B,C,D,K} and {E,F,G,H}. Check out the links under "Visual", "Nodes" and "Edges". Also check out what "Zoom in" does. Note that one connected component, {T1,T2}, was filtered out - modules smaller than 3 are not shown by default. Decrease the parameter "Remove modules with less than 3 nodes" in the form and submit again to see this component as well.

Next, look at what the following input does and try to understand the results. Pay attention to the coloured edges in the visualisation. This will show how edge labels work. Again, you have to make sure that "Hide insignificant modules" is unchecked.

DS1: N1 N2
DS1: N2 N3
DS1: N3 N1
DS2: N1 N2
DS2: N2 N4
DS2: N2 N5
DS3: N4 N1
DS3: N2 N5
DS1: N3 N5
DS2: N3 N5
DS3: N3 N5

After trying and understanding this example, you might try what happens if you add the option "Remove edges with less than 2 labels".

To see an example of how biological data is handled, check the box "From a file in our server" and try out some of the available data files.

Suggestions for browsing

JavaScript should be allowed on your browser for GraphWeb to work properly. Displaying the results page involves intensive computation and may take several minutes if the input is large. You are recommended not to navigate away from the results page if computing them took a long time, since pressing the 'back' button on your browser may cause the results to be recomputed. For this purpose, links on the results page have been set up to open in a new window or in a popup window.

About this document

This help file aims to describe all the features and functionality of GraphWeb. It is currently mostly accurate, but slightly outdated, as we are currently updating it. Some features are not fully described here yet, such as the advanced input feature. Also, some of the information may be outdated. If you find disrepancies between this document and the actual behaviour of GraphWeb, please use the contact form and notify us.

Methods of input

The graph data can be input in several ways (which can be combined), all found in the leftmost column of the input form:

  • typing or pasting the data into a textbox;
  • giving a file from your computer that contains the data;
  • selecting an example input provided by us;
  • saving an input into your personal folder on our server using g:PEDaM.

The last two options are both available by selecting "Choose an existing file", in a common dropdown menu.

If more than one input source is used (e.g. there is text in the textbox and an example file is selected), the data is combined by concatenation. This makes it possible to manually add some edges to an input file without modifying the file itself, or to combine your data with our example inputs.

Example inputs

The example inputs are:

  • IntAct protein-protein interaction (PPI) data of various species ((C) European Bioinformatics Institute; Creative Commons Attribution License ). Included is a combined dataset of human and mouse networks, which you can use to test our ortholog conversion feature;
  • HPRD (Human Protein Reference Database) human PPI data ((C) Johns Hopkins University and the Institute of Bioinformatics);
  • ID-SERVE human PPI data ((C) 2002, 2003, Arun Ramani and Edward Marcotte);
  • TRANSFAC human transcription factor data;
  • Fraenkel Lab yeast gene regulation data (a directed graph).

When choosing an example input, make sure that you select the correct organism from the organism menu as well (the input name contains the name of the organism). To download or see the example inputs, click on the link above the dropdown menu.

Manage data uploads with g:PEDaM

Besides uploading a dataset from your computer for analysis, it is also possible to store your data permanently on our server in your personal data folder, using the g:PEDaM file manager. The g:PEDaM interface itself, containing tools for uploading, viewing and deleting stored inputs, is located in the bottom half of the leftmost column.

To create your personal storage folder, click on "Create new folder". You will be able to protect your folder with a password. Later, you can click on "Use my existing folder" to log in with your chosen password. The files that you have uploaded are available in a menu where they can be viewed and deleted.

The files you have uploaded can be selected for analysation from the same menu that contains example inputs, which can be displayed by choosing the option "From a file in our server" in the first column.

Input syntax

Each line of the input should either describe one edge of the graph or assign a weight to an edge label. For an edge with several labels, there should be a line for each label it has. The syntax to describe an edge is

[Label:] Node1 [<|>|<>] Node2 [[%|!]EdgeWeight]

where square brackets denote optional parts of the syntax and the symbol | separates alternatives. The label and node identifiers are case insensitive (so abc and ABC refer to the same object), they must not contain whitespace or the symbols <, >, and =. Whitespace is required after the colon following the label (this is because node names may contain colons), but it is optional elsewhere as long as the input remains unambiguous. Edges for which the label name is not present will be given the label DEFAULT.

The arrow symbol between the edges shows the direction of the edge (one-directional or bidirectional). If the symbol is missing, the edge is assumed to be bidirectional. The whole graph is considered undirected if all edges in the input are bidirectional and directed if there is at least one edge where < or > is used as the separator. In the case of a directed graph, all undirected edges in the input will become two separate directed edges, pointed both ways.

GraphWeb also supports various methods for assigning edge weights. Weight here means the "trustworthiness" or "goodness" of the edge: the bigger the weight, the stronger the edge. The weights can be used in the MCL and betweenness centrality clustering algorithms, but also for simple edge filtering. The weights should be positive floating point numbers, written in the standard C format, with a dot (not a comma) as the decimal point. If an edge weight is not specified, the default weight of the label will be given to the edge. If an edge has multiple labels, the final weight of this edge will be the sum of weights associated to it for each label. However, an edge appearing twice with the same label is not supported and if the input contains this situation, a warning is shown and only the largest of the conflicting weights is assigned. To learn more about edge weights, see the paragraph Edge weights explained.

The syntax for assigning a default weight to a label is

Label = Weight

Whitespace is optional. The weight of a label is used as the weight of those edges having the label that don't have a weight specified themselves. It should be a positive real number. One label should be assigned a weight only once in the input - in case of duplicate weights, all but the first one are ignored. The assignment can be made in any part of the input and takes effect for the whole graph, both for edges described before and after the assignment.

Edge weights explained

As explained above, each edge can have a weight assigned to it on the line where the edge is described. Labels can also have weights assigned with the "Label = Weight" syntax. Weights are relevant in the algorithms MCL and betweenness centrality clustering (although both work without weights as well), they can also be used to filter out weaker edges in the graph. If no weights for edges and labels are specified in the graph, the weight of each edge will simply be the number of labels it appears with.

In the normal behaviour for assigning weights to edges, the weights will be normalised within each label. This means that all weights of edges with a given label are scaled into values between zero and one and then multiplied with the label weight. This means that the edge with the highest weight with this label will receive exactly the label weight (or 1 if there is no label weight).

If an exclamation mark ("!") is added in front of an edge weight, this weight will not be a part of the normalisation process. Instead, the edge will just get the weight after the exclamation mark.

If a precentage mark ("%") is inserted before an edge weight, the edge weight will be specified as a precentage of the label weight. For example, if the label weight is 200 and an edge with this label is given the weight \%40, the weight of the labeled edge will be 40% of 200 or 80.

If an edge has no weight specified, its weight for this label will be the corresponding label weight.

If a label doesn't have a weight specified in the input, its weight will normally be equal to 1, which means that edge weights will be normalised into the interval (0,1]. However, there is also a special setting, Assign more weight to smaller networks, which gives an alternative method of allocating weights to labels. With this method, labels with more edges get smaller weights and labels with less edges get larger weights. You can use it instead of choosing the weights yourself if you happen to consider your smaller datasets more trustworthy than large ones.

Gene ID conversion and orthologs

If the ID of a node is a valid protein or gene ID, g:Convert is used to convert it to a standard form (Ensembl gene name). This allows the input to contain different kinds of IDs. When two node IDs in the input really represent the same gene, they are considered one node. To use this feature, the correct organism must be chosen from the organism menu.

GraphWeb also supports converting the genes of one species into orthologous genes of another species. Orthologs are genes/proteins in different organisms that have the same evolutionary origin. It may be of interest to consider graphs where vertices are proteins of one organism (e.g human) and edges are between interacting human proteins, but also between those proteins whose orthologs in some other organism (e.g mouse) interact with each other. It is easily possible: just concatenate a human dataset with a mouse dataset (such as in the example dataset "IntAct human and mouse"), select human as the main organism, select the checkbox "Convert orthologs" and select mouse as the ortholog organism. All mouse genes that have human orthologs will be replaced with those orthologs in the graph.

It is possible that a gene in the input has either no Ensembl ID, or has several different IDs. GraphWeb's behaviour in this case depends on the parameters Remove unknown genes and Remove ambiguous genes. The unknown or ambiguous genes are either removed from the graph or retained with their original IDs.

In the output, clicking on the link Node name conversion displays information about which IDs were converted, which were unknown and which were ambiguous.

Algorithms

In the second column of the form you can choose the method that is used to extract subgraphs from a graph. Some of these methods also need parameters to be specified.

Graph clustering

MCL

Uses the Markov Cluster Algorithm by Stijn van Dongen to split the graph into clusters. Each node will belong to exactly one cluster. The output of the command-line program mcl implementing this algorithm is also displayed. The algorithm may run several minutes in large graphs.

The parameter called "inflation" can be used to impact the granularity of clustering. Values should range between 1.1 and 5.0, with 2.0 being the default and also a good value to start with.

A short overview of the algorithm follows (see the website for details). A set of nodes is considered a cluster if a random walk in the graph is likely to remain in that set. Edge weights are interpreted as probabilities of traversing that edge on one step of the random walk. For example, if a node has three out-edges with weights 2, 3 and 7, the probabilities of the next step of the random walk traversing those edges are 2/11, 3/11 and 7/11, respectively. The algorithm alternates two kinds of steps: expansion and inflation. Expansion means calculating the random walk probabilities for longer walks, which is accompliced by squaring the adjacency matrix. At the inflation step, entries of the matrix are all raised to the power of the inflation parameter, which increases the difference between high- and low-weighted edges. This process converges to a steady state from which the clustering is obtained.

We use the MCL implementation by van Dongen. For information about the speed of this algorithm, see How fast is MCL on the MCL website (basically, it is usually linear to the number of nodes, but with quite a large constant).

Betweenness centrality clustering

A shortest path from one node to another is a (directed) path with the property that the sum of the weights of its edges is the minimal possible. If the graph is unweighted, the shortest path is the one with the least number of edges. There can be more than one shortest path between a pair of edges, if several paths with the same length exist.

For the simple case where every node pair has an unique shortest path between them, the betweenness centrality of an edge is defined as the number of shortest paths passing through the given edge, where paths over all node pairs are considered. In the general case, if there are n shortest paths between some pair of nodes, each path is counted 1/n times. A definition of betweenness centrality is given here.

In betweenness centrality clustering, the betweenness centralities of all edges in the graph are calculated and the edge with the highest centrality is removed. This process is repeated until enough edges of the graph have been removed, at which point the connected modules of the remaining graph are taken as clusters. This is justified by the fact that edges between two distinct dense clusters are likely to have a high centrality, because shortest paths from one cluster to another will often include that edge (there are few alternatives).

A full implementation of the algorithm would require finding shortest paths between all pairs of nodes as many times as many edges will be removed, which would take too much time on large graphs. To solve this problem, we have tried to approximate the centrality by not finding all shortest paths, but selecting a random sample of nodes and finding shortest paths from nodes in the sample to all nodes. We hypothesize that if the sample is large enough, the resulting clustering will approximate the full betweenness centrality clustering. Another possible optimisation is removing several edges on each iteration, which means that less iterations are needed to remove the required number of edges.

You can modify three options. The first is the precentage of edges in the graph that will remain in the graph. The smaller it is, the more iterations of the clustering will be made and the more fine-grained the clustering will be. The second parameter is the precentage of edges that are included in the random sample. For smaller graphs (e.g. IntAct mouse or cattle datasets) it is OK to leave this at 100%, resulting in a true (non-randomised) betweenness centrality clustering. If the graph is bigger, the sample must be decreased. The third parameter is the number of edges removed on each iteration. If it is 1, shortest paths are recomputed after each edge is removed. The bigger it is, the faster the calculation.

To avoid load on our server, the computation is immediately cancelled if it would take too much time (more than 10 billion computational steps, calculated as edges_to_remove*sample_size*num_edges/edges_removed_on_iteration). Some graphs may be just too large to cluster with this method.

Basic graph algorithms

Connected components

In an undirected graph, two nodes belong to the same connected component if there exists a path from one to the other. In a directed graph, choosing this will find weakly connected components - connected components in the graph obtained by ignoring all edge directions.

The algorithm takes O(V+E) computation time, where V is the number of nodes and E the number of edges.

Strongly connected components

In a directed graph, two nodes u and v belong to the same strongly connected component if there are directed paths both from u to v and from v to u. Similarly to connected components in undirected graphs, this is an equivalence relation, so every node is in exactly one component. The computation time is O(V+E).

On undirected graphs, this algorithm has the same effect as connected components.

Biconnected components

A graph is called biconnected if it is connected and the removal of any one node does not disconnect it. A biconnected component of a graph is any maximal biconnected subgraph (i.e. it is not possible to add nodes or edges to the subgraph without losing its biconnectedness). A graph is partitioned into biconnected components by articulation points - nodes whose removal would increase the number of connected components. Articulation points belong to more than one biconnected component, all other nodes - to exactly one component.

This algorithm is meant for undirected graphs. If the graph is directed, it will be transformed into an undirected graph by ignoring all edge directions. The time complexity is O(V+E).

Maximal cliques

A clique is a subgraph where each pair of nodes has an edge between them. A maximal clique is a clique that is not a part of a bigger clique. The algorithm finds maximal cliques of a graph with 4 or more nodes. The algorithm is very slow when the graph is very big and dense. Because of that, the program will stop searching for new cliques if more than 1000 are found. Also, searching for cliques is disabled for very dense graphs where the number of connected edge pairs is over 50% of total edge pairs.

Node grouping

Hub-based modules

For each node in the graph, a subgraph will be given containing all nodes whose (unweighted) distance from the hub is not greater than the given parameter. For example, if the parameter equals 1, each module will contain a central node with all of its neighbours. If it equals 2, neighbours of the hub's neighbours are also added, etc. Visualisations in this case show not only edges from the hub to the neighbours, but also edges between the neighbours.

Whole graph

Returns just one module which contains all nodes of the input graph, after filtering has been applied to it.

Options and settings

Labels and edge weights

Assign more weight to smaller networks

If selected, those labels that don't have their weights specified in the input will have weights chosen so that they will be inverse proportional to the number of edges in each label (that is, the product of the number of edges in a label and the label weight will be constant). This means that labels with less edges are considered more trustworthy and edges having them get greater weights. Labels that have their weights specified in the input are not affected by this setting. The weights of the unknown labels are scaled so that the equality

SUM(weight(label) * num_of_edges(label)) = Average_weight * SUM(num_of_edges(label))

holds, where both sums are over those labels that don't have their weights specified in the input. Average_weight is equal to 1 by default. This can be changed by assigning a weight to the label DEFAULT, that is the label that edges that don't have a label specified belong to. In that case, Average_weight will equal the weight of DEFAULT.

For an explanation of label weights and how they affect edge weights, see Input syntax and Edge weights explained.

For the purpose of this setting, the number of edges in a label is calculated after unknown and ambiguous genes have been removed, but before any other node or edge filtering is applied.

Create global network

If this is selected, labels will not be shown in the output. Instead, all edges are merged into one label and the weight of each edge will be assigned to the sum of the weights of this edge in each label. This setting also affects edge lists and visualisations, where the label structure of the input will not be displayed. In the edge lists of modules, each edge will have its weight specified separately, without the use of labels. Also, label distribution in modules is not shown with this setting. The setting may be useful if the number of labels is very large, since edges belonging to very many labels will look strange on visualisations otherwise. Note that using labels in the input may still be useful with this setting because it helps to determine edge weights.

Node filtering

This section contains options that are quite different, but they all have in common the fact that they remove some nodes from the graph.

Remove unknown genes
Remove ambiguous genes

If selected, nodes in the input that either are not valid gene or protein IDs or are ambiguous (are associated with either no Ensembl gene name or more than one Ensembl gene name in g:Convert) are not included in the graph. By following the link Node name conversion on the output page, you can see that those nodes are marked as unknown or ambiguous. If the checkbox is not selected, those nodes will remain in the graph under their own names (unlike other nodes, whose names will be changed to the gene names).

Keep N% of highest degree nodes

This option allows exploring connections between the "hubs" of the network - nodes that have the highest number of adjacent edges. You specify a precentage of nodes to be retained. Nodes with less neighbours are removed. If two nodes have the same number of neighbours, they are either both retained or both removed, so the actual amount of nodes retained may be higher if there are a lot of equal-degree nodes near the specified cut. The default value is 100%, which means no filtering will take place.

Note that the degrees of nodes are calculated after edge filtering has taken place, if you are using it.

Network neighbourhood

Removes all nodes, except those that are within a given distance of the given set of nodes, which can be entered into the textbox below. If you don't want any neighbours included, use the distance 0. Node names can be given as any gene/protein ID supported by g:Convert.

This can be useful if you know a set of nodes in advance and are interested in filtering out the subgraph induced by these nodes and edges between them. It is also possible to include a closer or further neighbourhood of this set. If you have a large network, one or more genes you're especially interested in and you want to find out which of those genes are close to each other in the graph, this option will suit you. It probably works best with the "connected components" algorithm, as you can see immediately which nodes in the set are connected to each other.

Edge filtering

The following settings allow removing some edges from the graph. If more than one is selected, all edges that meet at least one removal criterion are removed. The removed edges don't appear in Graphviz visualisations and the label distribution barplots. The removal takes place after the graph has been read in and label weights have been calculated (so the removed edges still affect label weights if Assign more weight to smaller networks is selected). By the default settings, no edge filtering is done.

Remove edges with less than N labels

Edges that have less than a specified number of labels are removed from the graph. This may allow splitting the graph further if the union of all labeled graphs is too dense to be split.

Keep N% of heaviest edges

Works much like the previous option, but instead of specifying a weight yourself, you can give a precentage of edges that should remain. Edges with lower weight that don't fall among the precentage given are removed. If there are several adges with equal weight on the "border" of the precentage, all of those edges will be kept - that is, an edge is removed if and only if all edges with the same weight are removed too. Because of this, the option is not suitable for graphs where all edges have equal weights.

Module filtering

Hide modules with less than N nodes

Sometimes an algorithm will produce a lot of very small modules (with 1-2 nodes each) and only a few large and interesting ones. This setting is useful in that case. Modules that are not shown don't need to be queried in g:Profiler, so setting this and the following parameter can speed up processing.

Show N largest modules

Similarly to the previous option, this allows decreasing the number of modules displayed. Only modules that belong among the specified number of largest modules by the number of nodes will be shown in the output table. If the number in the textbox is zero, all found modules are shown. The default value is 100. If there are several modules with the same size tied to the last place to be kept, which ones will be shown is not determined.

Scores and sorting

Calculate g:Profiler scores

If this is selected, genes belonging to each module found are automatically queried in g:Profiler. From the results of the query, a score is calculated for each module.

Since the queries take a lot of time, turning this box off increases speed significantly and it is recommended if you are not interested in g:Profiler results.

Sort modules by functional score

By default, modules in the output are sorted by the number of nodes in them. However, if Calculate scores is selected, it is also possible to sort the output by the score.

Note that when sorting by score is selected, all modules will have to be queried in g:Profiler before any results can be shown. This may take several minutes if the number of modules is large, causing a delay before any output is shown. When sorting by module size, your browser will be able to show bigger modules immediately after they have been processed.

Hide insignificant modules

This option, turned on by default, hides those modules that g:Profiler considers biologically insignificant, i.e for which it doesn't find any annotations (clicking on the "execute g:Profiler" link would give no results). This means that the set of genes in this module was not found to have any significant common biological properties. This checkbox has effect only if the checkbox Calculate scores is selected. It must be turned off when analysing a non-biological network.

Output

Structure of the output

Most of the output consists of a table containing one row for each graph module found. Most of this section describes the meaning of the columns in this table. Depending on the initial settings, some of the columns don't always appear.

If there is more than one label, each label will be assigned a colour which is used in images. A legend showing the colours of each label is displayed near the beginning of output.

The total number of modules found by the splitting algorithm is also displayed. This refers to all existing modules, even those that are not shown in the table due to settings.

Node name conversion

The output begins with a link "Node name conversion", which displays a table showing the mapping of node names in the input to gene names, which are used as node names in the output. The link opens in the upper frame of the window. Nodes whose names didn't correspond to known genes in our database are marked as 'N/A'. Those nodes are in the output either with their original names or are entirely removed from the graph, depending on whether the option Remove unknown genes is enabled.

Comp#

An identification number for the module. The largest module (with most nodes) gets the number 1 and so on. For hub-based modules, the numbers are accompanied with the names of respective hub nodes.

Number of nodes

The number of nodes belonging to the module.

List nodes

A link to a text file containing names of all nodes in the module, one per line.

List edges

A text file that is in the input format for this application, describing the subgraph induced by nodes of this module. If you wish to further analyse this module and break it apart with another algorithm, you can easily do it by copying and pasting the file to a new query. Besides edges, the file also contains all label and weight information. Submitting this file as an input to the application and selecting Whole graph as the algorithm should give exactly the same result as the original module. (See also the following option.)

Zoom in

Clicking on the link "Zoom in" in this column, the upper frame (containing the form) will be reloaded with the form containing syntax representing this module. The organism used in the current graph is also inserted into the form. This makes it easy to take a module to further splitting with a new algorithm.

Label distribution

The column exists if there is more than one label in the input. It contains a bar graph showing the distribution of labels between the edges of the module.

Label distribution image example

This example image corresponds to a module that has edges with three labels, denoted by the colours red, blue and green. The image shows that about a third of all edges in the module only have the red label. Another third of edges have all three labels. A sixth of edges have the red and blue labels, but not the green one. Finally, the remaining sixth only have the blue label.

Score

The score for the module is calculated based on the p-values found by g:Profiler for the set of nodes of the module. The purpose of the score is to give you an idea which modules could be interesting and worth looking at. The smaller the p-values, the greater the score should be. The score is calculated by summing the logarithms of all significant p-values and taking the absolute value of the resulting negative number. The result is then divided by the number of nodes in the module in order to remove bias for very big modules. This column appears only if the checkbox Calculate scores is selected. The score is designed to help you see better which modules are more biologically relevant and worth looking at.

g:Profiler annotations

A link to the web interface of g:Profiler that gives a graphical output for the genes in this module. The link is opened in a new window, so it's safe to click on it even when loading the results page (calculating all the results) isn't finished yet.

Visualisations

A visualisation of the graph module, drawn using Graphviz. It shows the nodes and edges of the graph, as well as the labels each edge has (shown by the colour of the edge) and edge weights obtained by summing the weight of the edge for each label (weights equal to 1 are not displayed).

After the click, a pop-up window is opened, the visualisation is calculated (which may take some time for large modules) and displayed as a PNG image.

Two kinds of visualisations are available: labeled (with node names and edge weights) and compact (with no labels and nodes represented by dots). The unlabeled images are recommended to get an overview of big graphs, because they are much smaller in size.

Since big graphs take a long time to visualise and the resulting images tend to be too large to manage, visualisations are completely disabled for modules with over 300 nodes. Labeled pictures are disabled at 60 nodes already because the images are several times bigger than compact ones.

Summary row

This row represents a graph formed by the union of all displayed modules, that is, modules that weren't filtered out. A node list and an edge list are available, containing those edges and nodes that exist in any of the shown modules. There is also a Zoom in link, which makes it possible to analyse the graph obtained by removing all edges that only appeared in filtered-out modules.

g:Cocoa

g:Cocoa (Compact Compare of Annotations) is an extension to g:Profiler that allows comparing the g:Profiler annotations of several lists of genes. Clicking on this link gives a comparision of the g:Profiler results of all modules shown in the table (one gene list is created from each module). In g:Cocoa the modules are shown in the same order in which they appear in the GraphWeb table. g:Cocoa results are shown only for those displayed modules that have less than 1500 nodes.

Searching the output

The search feature enables finding the module(s) that a node belongs to after the modules have been calculated. Write the name of the node into the text field, press "Find" and a new window opens with the list of modules that contain that node. The name of the node written into the search should be the gene name that is used in the output. If you wish to search for a node identificator by the ID used in the input, please check out its standardised name first by clicking on the Node name conversion link.

Known problems

If any of those problems (or any other problems I've forgot to mention here) become a nuisance, please use our contact form and notify us, so we could consider fixing it.

  • If the number of labels becomes large (over 27), the colours are selected totally randomly, so two labels may end up with indistinguishable colours or a label may get a colour too light to see on the visualisation.
  • For very large graphs (including some of our example inputs), the clique algorithm sometimes crashes without displaying any results.



GraphWeb 2007-2008
Laur Tooming & Jüri Reimand & Jaak Vilo @ BIIT Group, Institute of Computer Science, University of Tartu.