alternate text

Introduction

The sequence logo has been used for more than three decades to depict the consensus and diversity of sequence motif. However, the in-line sequence logo fails to unveil the intra-motif dependency and therefore is insufficient to fully characterize sequence motifs, which have been demonstrated by many studies (Man et al, 2001; Bulyk et al, 2004).

CircularLogo (circular sequence logo) is a web application for creating circular sequence logos for DNA (or RNA) motifs. By virtue of the circular layout, CircularLogo not only displays nucleotide stacks like the traditional sequence logo, but also depicts intra-motif dependencies using linked arches.

CircularLogo takes JSON or FASTA format files as input and generates circular sequence logo in PNG, JPEG and TIFF format (see details below).

When the input file is in FASTA format, CircularLogo provides two basic metrics (Mutual Information and Chi-square statistic) to quantify the intra-motif dependencies. However, CircularLogo is by and large a motif visualizing tool. When sophisticated statistical models are needed to quantify intra-motif dependencies, users need to perform this analysis by themselves first, and then prepare a JSON format input file.

CircularLogo can be accessed from our webserver or easily built from source code following instructions below.

Run CircularLogo online

Screen shot of CircularLogo webserver

alternate text

Build CircularLogo webserver from source code

Users could build CircularLogo webserver on their laptop or other local computers following instructions below.

Step 1 (Prerequisites)

These packages need to be installed before installing CircularLogo:

Use pip to install these prerequisites:

pip install Django
pip install biopython
pip install numpy
pip install scipy

Step 2 (Download, install and start webserver)

Download CircularLogo source code from here.

unzip circularlogo_VERSION.zip
cd circularlogo_VERSION
python manage.py runserver 0.0.0.0:8000        #start webserver

Step 3 (Use CircularLogo webserver)

  • Open a web browser, input this address (http://localhost:8000/circularlogo/index.html) into the URL bar. CircularLogo is now ready to use.
  • You can test your local CircularLogo webserver by clicking “Load FASTA Demo” or “Load JSON Demo”. Then click “Create Logo”, you should be able view the logo created from our test data.
alternate text

Step 4 (quit webserver)

You can quite the CircularLogo webserver by pressing CONTROL-C at any time.

Input formats

CircularLogo takes JSON or FASTA format files as input.

When to use JSON file as input ?

JSON file is useful when you have prior knowledge about intra-motif dependencies, and your goal is to visualize these dependencies.

How to prepare JSON file ?

JSON (JavaScript Object Notation) is a plain text file. You could manually make a JSON format file or edit a existing JSON file. Please note CircularLogo can also export a JSON format file (as a template for you) if you input a FASTA file.

JSON format file contains 3 sections to describe a circular motif:

{
       "id":"AR",
       "background":{"key":["A","T","C","G"],"val":[0.25,0.25,0.25,0.25]},
       "pseudocounts":{"key":["A","T","C","G"],"val":[0.25,0.25,0.25,0.25]},
       "nodes":[
               {"index":0,"label":"1","bit":0.779265366373,"base":["C","T","G","A"],"freq":[0.006,0.043,0.347,0.604]},
               {"index":1,"label":"2","bit":1.57103613124,"base":["C","T","A","G"],"freq":[0.012,0.017,0.032,0.939]},
               {"index":2,"label":"3","bit":0.609476284231,"base":["G","C","T","A"],"freq":[0.027,0.053,0.388,0.532]},
               {"index":3,"label":"4","bit":1.79237941693,"base":["C","T","G","A"],"freq":[0.001,0.012,0.012,0.976]},
               {"index":4,"label":"5","bit":1.7608900955,"base":["A","G","T","C"],"freq":[0.001,0.012,0.017,0.97]},
               {"index":5,"label":"6","bit":1.35037781295,"base":["C","G","T","A"],"freq":[0.017,0.017,0.079,0.888]},
               {"index":6,"label":"7","bit":0.00719165322846,"base":["A","G","C","T"],"freq":[0.187,0.249,0.28,0.285]},
               {"index":7,"label":"8","bit":0.0651518291752,"base":["G","C","A","T"],"freq":[0.146,0.202,0.305,0.347]},
               {"index":8,"label":"9","bit":-0.00870652334553,"base":["A","G","C","T"],"freq":[0.233,0.238,0.259,0.269]},
               {"index":9,"label":"10","bit":1.43879070277,"base":["G","C","A","T"],"freq":[0.001,0.017,0.084,0.898]},
               {"index":10,"label":"11","bit":1.72492316015,"base":["C","T","A","G"],"freq":[0.006,0.006,0.022,0.965]},
               {"index":11,"label":"12","bit":1.7608900955,"base":["G","C","A","T"],"freq":[0.001,0.012,0.017,0.97]},
               {"index":12,"label":"13","bit":0.525556517288,"base":["C","G","A","T"],"freq":[0.043,0.068,0.341,0.548]},
               {"index":13,"label":"14","bit":1.34114459935,"base":["G","A","T","C"],"freq":[0.017,0.043,0.048,0.893]},
               {"index":14,"label":"15","bit":0.668493029832,"base":["G","A","C","T"],"freq":[0.006,0.068,0.383,0.543]}
               ],
       "links":[
               {"source":3,"target":4,"value":162.623389175},
               {"source":3,"target":11,"value":162.623389175},
               {"source":3,"target":10,"value":160.757409794},
               {"source":4,"target":11,"value":160.757409794},
               {"source":4,"target":10,"value":158.901739691},
               {"source":10,"target":11,"value":158.901739691},
               {"source":1,"target":3,"value":151.613079897},
               {"source":1,"target":4,"value":149.808956186},
               {"source":1,"target":11,"value":149.808956186},
               {"source":1,"target":10,"value":148.015141753},
               {"source":3,"target":9,"value":138.375966495},
               {"source":4,"target":9,"value":136.65431701},
               {"source":9,"target":11,"value":136.65431701},
               {"source":3,"target":13,"value":136.056378866},
               {"source":9,"target":10,"value":134.942976804},
               {"source":3,"target":5,"value":134.778028351},
               {"source":4,"target":13,"value":134.34503866},
               {"source":11,"target":13,"value":134.34503866},
               {"source":4,"target":5,"value":133.076997423},
               {"source":5,"target":11,"value":133.076997423},
               {"source":10,"target":13,"value":132.644007732},
               {"source":5,"target":10,"value":131.386275773},
               {"source":1,"target":9,"value":126.571842784},
               {"source":1,"target":13,"value":124.324420103},
               {"source":1,"target":5,"value":123.118234536},
               {"source":9,"target":13,"value":112.40689433},
               {"source":6,"target":14,"value":10.5821520619},
               {"source":2,"target":8,"value":9.30380154639},
               {"source":6,"target":12,"value":9.16978092784},
               {"source":8,"target":12,"value":9.12854381443}
               ]
}
  • id section: Specifies the name of the motif, and will be displayed on top of the graph.
  • nodes section: This section describes the motif residues at each position.
  • index: indicates the order (in anticlockwise ) of the residues along the circular track. Index starts from 0.
  • label: indicates the label of each position.
  • bit: indicates Shannon’s information content of residues at a position. The method to calculate bit is described here.
  • base: denotes the array of residues, the orders of residues and frequencies should be consistent. For example, in the position with index = 0, the frequency for each residue is “C” = 0.006, “T” = 0.043, “G” = 0.347, “A” = 0.604.
  • freq: denotes the frequency of each residue.
  • links section: This section describes the links between motif residues.
  • source: indicates the “index” of source node (or the start position of the link).
  • target: indicates the “index” of target node (or the end position of the link).
  • value: indicates the strength of the link between the source and target nodes. The values associated with links will be linearly mapped to the “width” of the linked ribbons in the output image. We generally advise to put the smaller index as “source” and the bigger as “target”, but this is not mandatory since the layout of the motif graph is circular.

CircularLogo will transform the above JSON format file into a circular logo (below),in which the width of links reflect the strength of dependence among nucleotide stacks.

alternate text

When to use FASTA file as input ?

You have motif sequences (e.g. through de novo motif search from your ChIP-seq peaks), and your goal is to explore the intra-motif dependencies.

How to prepare FASTA format file ?

FASTA is also a plain text file (see below for example). When preparing a FASTA format file, it’s a good idea to extend X nucleotides to both up- and downstream of each motif sites. The extended parts are genome background, and dependencies calculated from these nucleotides (usually very weak) are arguably representing the genome background noise.

>site1
GGTACAGTTTGTACA
>site2
AATACAGATTGTTCT
>site3
AATACAGAGTGTACT
>site4
AGAACATAATGTACA
>site5
AGTACACTCTGTAAT
>site6
GGAACATTTTGTTTT
>site7
AGCACAAGATGTTCT
>site8
AGTACTTCCTGTTCC
>site9
GGTACACTGTGTACT
>site10
GGTACAAACTGTTCT
...

In this scenario, the input FASTA file will be automatically transformed into the JSON format motif representation with the intra-motif dependencies measured by chi-square statistic or mutual information. In addition to chi-square statistic and mutual information, many other statistical methods have been developed to quantify the intra-motif dependencies:

  • the generalized weight matrix models (Zhou et al. 2004),
  • sparse local inhomogeneous mixture (Slim) models (Jens et al. 2011),
  • transcription factor flexible models (TFFMs) based on hidden Markov models (Mathelier et al. 2013),
  • a binding energy model (BEM) (Zhao et al. 2012),
  • inhomogeneous parsimonious Markov (PMMs) models (Eggeling et al. 2015).

Users need to customize the JSON files (can be exported from CircularLogo webserver) if they want to use these methods.

NOTE:
  • DNA or RNA sequences must be the same length
  • DNA or RNA sequences contain no line breaks
  • Do not mix DNA and RNA sequences together in one file

Which metric to use? Chi-square or Mutual Information?

Although both chi-square and mutual information are used to measure dependency, they are two different metrics and should be used in different conditions.

Essentially, the chi-square statistic measures the “co-occurrence” of nucleotides at two different positions. Therefore, chi-square is more capable of measuring dependency between two “conserved (i.e. less variable)” positions. Because for two highly variable positions, the frequency of dinucleotides (which were used to measure “co-occurrence” and calculated chi-square statistic) is close to background (i.e. 1/16) and chi-square statistic is close to 0.

On the contrary, mutual information measures the reduction in uncertainty about nucleotide frequencies on one position, given some knowledge of nucleotide frequencies on another position. Therefore, mutual information is more capable of measuring dependency between two highly variable positions. Because for a highly conserved position which is dominated by a particular nucleotide, the information content of each position and the mutual information between two positions are approaching 0 bit.

How p-values are calculated?

When Chi-square metric was used, the significance of dependency beween two positions was evaluated using Chi-square test. Chi-square metric is calculated as below:

chi-square

When Mutual Information metric was used, the significance of dependency between two positions was evaluated using Chebyshev’s inequality. For example, if the observed mutual information is K * stdev times larger than that expected from random background model. P <= 1/K**2.

Output examples

Users can download the FASTA file representing HNF6 DNA from here. Or simply click the “Load FASTA Demo” button.

Method = Mutual Information. P-value cutoff = 1 (Displays all links without filtering).

HNF6 all

Method = Mutual Information. P-value cutoff = 1 (Displays all links without filtering). Fucus on node 33 (Using the focus on node drop-down list to select 33).

HNF6 motif 18th position centered.

Method = Mutual Information. P-value cutoff = 1 (Displays all links without filtering). Fucus on node 5 (Using the focus on node drop-down list to select 5).

HNF6 motif 33th position centered.

Method = Mutual Information. P-value cutoff = 0.01 (CircularLogo automatically filter out links with p-value > 0.01).

alternate text

Method = Mutual Information. P-value cutoff = 0.01 (Filter out links with p-value > 0.01). Manually filter out other weak links and focus on node 33

alternate text

LICENSE

CircularLogo is distributed under GNU General Public License (GPLv2)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

Contact