.. toctree:: :maxdepth: 2 .. image:: _static/Logo.png :height: 300px :width: 900 px :scale: 50 % :alt: alternate text .. CircularLogo documentation master file, created by sphinx-quickstart on Wed Nov 2 16:18:35 2016. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Introduction ======================================== The sequence logo has been used for more than three decades to depict the consensus and diversity of sequence motif. However, the in-line sequence logo fails to unveil the intra-motif dependency and therefore is insufficient to fully characterize sequence motifs, which have been demonstrated by many studies (Man et al, 2001; Bulyk et al, 2004). **CircularLogo** (circular sequence logo) is a web application for creating circular sequence logos for DNA (or RNA) motifs. By virtue of the circular layout, CircularLogo not only displays nucleotide stacks like the traditional sequence logo, but also depicts intra-motif dependencies using linked arches. **CircularLogo** takes `JSON `_ or `FASTA `_ format files as input and generates circular sequence logo in **PNG**, **JPEG** and **TIFF** format (see details below). When the input file is in FASTA format, CircularLogo provides two basic metrics (Mutual Information and Chi-square statistic) to quantify the intra-motif dependencies. However, **CircularLogo** is by and large a **motif visualizing tool**. When sophisticated statistical models are needed to quantify intra-motif dependencies, users need to perform this analysis by themselves first, and then prepare a JSON format input file. **CircularLogo** can be accessed from our `webserver `_ or easily built from `source code `_ following instructions below. Run `CircularLogo `_ online ========================================================================================== CircularLogo online webserver ------------------------------ http://bioinformaticstools.mayo.edu/circularlogo/index.html Screen shot of CircularLogo webserver -------------------------------------- .. image:: _static/description1.png :height: 700 px :width: 950 px :scale: 80 % :alt: alternate text Build CircularLogo webserver from source code ============================================== Users could build CircularLogo webserver on their laptop or other local computers following instructions below. Step 1 (Prerequisites) ----------------------- These packages need to be installed before installing CircularLogo: * `python 2.7.10 `_ * `Django `_ * `biopython `_ * `numpy `_ * `scipy `_ Use pip to install these prerequisites:: pip install Django pip install biopython pip install numpy pip install scipy Step 2 (Download, install and start webserver) ---------------------------------------------- Download CircularLogo source code from `here. `_ :: unzip circularlogo_VERSION.zip cd circularlogo_VERSION python manage.py runserver 0.0.0.0:8000 #start webserver Step 3 (Use CircularLogo webserver) -------------------------------------- * Open a web browser, input this address (http://localhost:8000/circularlogo/index.html) into the URL bar. CircularLogo is now ready to use. * You can test your local CircularLogo webserver by clicking "Load FASTA Demo" or "Load JSON Demo". Then click "Create Logo", you should be able view the logo created from our test data. .. image:: _static/localhost.png :height: 600 px :width: 850 px :scale: 80 % :alt: alternate text Step 4 (quit webserver) ------------------------ You can quite the CircularLogo webserver by pressing **CONTROL-C** at any time. Input formats ================ **CircularLogo** takes `JSON `_ or `FASTA `_ format files as input. When to use JSON file as input ? ----------------------------------- JSON file is useful when you have prior knowledge about intra-motif dependencies, and your goal is to visualize these dependencies. How to prepare JSON file ? ----------------------------------- JSON (JavaScript Object Notation) is a plain text file. You could manually make a JSON format file or edit a existing JSON file. Please note CircularLogo can also export a JSON format file (as a template for you) if you input a FASTA file. JSON format file contains 3 sections to describe a circular motif:: { "id":"AR", "background":{"key":["A","T","C","G"],"val":[0.25,0.25,0.25,0.25]}, "pseudocounts":{"key":["A","T","C","G"],"val":[0.25,0.25,0.25,0.25]}, "nodes":[ {"index":0,"label":"1","bit":0.779265366373,"base":["C","T","G","A"],"freq":[0.006,0.043,0.347,0.604]}, {"index":1,"label":"2","bit":1.57103613124,"base":["C","T","A","G"],"freq":[0.012,0.017,0.032,0.939]}, {"index":2,"label":"3","bit":0.609476284231,"base":["G","C","T","A"],"freq":[0.027,0.053,0.388,0.532]}, {"index":3,"label":"4","bit":1.79237941693,"base":["C","T","G","A"],"freq":[0.001,0.012,0.012,0.976]}, {"index":4,"label":"5","bit":1.7608900955,"base":["A","G","T","C"],"freq":[0.001,0.012,0.017,0.97]}, {"index":5,"label":"6","bit":1.35037781295,"base":["C","G","T","A"],"freq":[0.017,0.017,0.079,0.888]}, {"index":6,"label":"7","bit":0.00719165322846,"base":["A","G","C","T"],"freq":[0.187,0.249,0.28,0.285]}, {"index":7,"label":"8","bit":0.0651518291752,"base":["G","C","A","T"],"freq":[0.146,0.202,0.305,0.347]}, {"index":8,"label":"9","bit":-0.00870652334553,"base":["A","G","C","T"],"freq":[0.233,0.238,0.259,0.269]}, {"index":9,"label":"10","bit":1.43879070277,"base":["G","C","A","T"],"freq":[0.001,0.017,0.084,0.898]}, {"index":10,"label":"11","bit":1.72492316015,"base":["C","T","A","G"],"freq":[0.006,0.006,0.022,0.965]}, {"index":11,"label":"12","bit":1.7608900955,"base":["G","C","A","T"],"freq":[0.001,0.012,0.017,0.97]}, {"index":12,"label":"13","bit":0.525556517288,"base":["C","G","A","T"],"freq":[0.043,0.068,0.341,0.548]}, {"index":13,"label":"14","bit":1.34114459935,"base":["G","A","T","C"],"freq":[0.017,0.043,0.048,0.893]}, {"index":14,"label":"15","bit":0.668493029832,"base":["G","A","C","T"],"freq":[0.006,0.068,0.383,0.543]} ], "links":[ {"source":3,"target":4,"value":162.623389175}, {"source":3,"target":11,"value":162.623389175}, {"source":3,"target":10,"value":160.757409794}, {"source":4,"target":11,"value":160.757409794}, {"source":4,"target":10,"value":158.901739691}, {"source":10,"target":11,"value":158.901739691}, {"source":1,"target":3,"value":151.613079897}, {"source":1,"target":4,"value":149.808956186}, {"source":1,"target":11,"value":149.808956186}, {"source":1,"target":10,"value":148.015141753}, {"source":3,"target":9,"value":138.375966495}, {"source":4,"target":9,"value":136.65431701}, {"source":9,"target":11,"value":136.65431701}, {"source":3,"target":13,"value":136.056378866}, {"source":9,"target":10,"value":134.942976804}, {"source":3,"target":5,"value":134.778028351}, {"source":4,"target":13,"value":134.34503866}, {"source":11,"target":13,"value":134.34503866}, {"source":4,"target":5,"value":133.076997423}, {"source":5,"target":11,"value":133.076997423}, {"source":10,"target":13,"value":132.644007732}, {"source":5,"target":10,"value":131.386275773}, {"source":1,"target":9,"value":126.571842784}, {"source":1,"target":13,"value":124.324420103}, {"source":1,"target":5,"value":123.118234536}, {"source":9,"target":13,"value":112.40689433}, {"source":6,"target":14,"value":10.5821520619}, {"source":2,"target":8,"value":9.30380154639}, {"source":6,"target":12,"value":9.16978092784}, {"source":8,"target":12,"value":9.12854381443} ] } * **id section**: Specifies the name of the motif, and will be displayed on top of the graph. * **nodes section**: This section describes the motif residues at each position. * index: indicates the order (in anticlockwise ) of the residues along the circular track. Index starts from 0. * label: indicates the label of each position. * bit: indicates Shannon's information content of residues at a position. The method to calculate bit is described `here. `_ * base: denotes the array of residues, the orders of residues and frequencies should be consistent. For example, in the position with index = 0, the frequency for each residue is "C" = 0.006, "T" = 0.043, "G" = 0.347, "A" = 0.604. * freq: denotes the frequency of each residue. * **links section**: This section describes the links between motif residues. * source: indicates the "index" of source node (or the start position of the link). * target: indicates the "index" of target node (or the end position of the link). * value: indicates the strength of the link between the source and target nodes. The values associated with links will be linearly mapped to the "width" of the linked ribbons in the output image. We generally advise to put the smaller index as "source" and the bigger as "target", but this is not mandatory since the layout of the motif graph is circular. CircularLogo will transform the above JSON format file into a circular logo (below),in which the width of links reflect the strength of dependence among nucleotide stacks. .. image:: _static/AR.png :height: 600 px :width: 600 px :scale: 80 % :alt: alternate text When to use FASTA file as input ? ----------------------------------- You have motif sequences (e.g. through de novo motif search from your ChIP-seq peaks), and your goal is to explore the intra-motif dependencies. How to prepare FASTA format file ? ----------------------------------- `FASTA `_ is also a plain text file (see below for example). When preparing a FASTA format file, it's a good idea to extend *X* nucleotides to both up- and downstream of each motif sites. The extended parts are genome background, and dependencies calculated from these nucleotides (usually very weak) are arguably representing the genome background noise. :: >site1 GGTACAGTTTGTACA >site2 AATACAGATTGTTCT >site3 AATACAGAGTGTACT >site4 AGAACATAATGTACA >site5 AGTACACTCTGTAAT >site6 GGAACATTTTGTTTT >site7 AGCACAAGATGTTCT >site8 AGTACTTCCTGTTCC >site9 GGTACACTGTGTACT >site10 GGTACAAACTGTTCT ... In this scenario, the input FASTA file will be automatically transformed into the JSON format motif representation with the intra-motif dependencies measured by chi-square statistic or mutual information. In addition to chi-square statistic and mutual information, many other statistical methods have been developed to quantify the intra-motif dependencies: * the generalized weight matrix models (Zhou et al. 2004), * sparse local inhomogeneous mixture (Slim) models (Jens et al. 2011), * transcription factor flexible models (TFFMs) based on hidden Markov models (Mathelier et al. 2013), * a binding energy model (BEM) (Zhao et al. 2012), * inhomogeneous parsimonious Markov (PMMs) models (Eggeling et al. 2015). **Users need to customize the JSON files (can be exported from CircularLogo webserver) if they want to use these methods.** **NOTE:** * DNA or RNA sequences must be the same length * DNA or RNA sequences contain no line breaks * Do not mix DNA and RNA sequences together in one file Which metric to use? Chi-square or Mutual Information? ======================================================== Although both chi-square and mutual information are used to measure dependency, they are two different metrics and should be used in different conditions. Essentially, the chi-square statistic measures the “co-occurrence” of nucleotides at two different positions. Therefore, chi-square is more capable of measuring dependency between two “conserved (i.e. less variable)” positions. Because for two highly variable positions, the frequency of dinucleotides (which were used to measure “co-occurrence” and calculated chi-square statistic) is close to background (i.e. 1/16) and chi-square statistic is close to 0. On the contrary, mutual information measures the reduction in uncertainty about nucleotide frequencies on one position, given some knowledge of nucleotide frequencies on another position. Therefore, mutual information is more capable of measuring dependency between two highly variable positions. Because for a highly conserved position which is dominated by a particular nucleotide, the information content of each position and the mutual information between two positions are approaching 0 bit. How p-values are calculated? ============================== When Chi-square metric was used, the significance of dependency beween two positions was evaluated using Chi-square test. Chi-square metric is calculated as below: .. image:: _static/chi-square.png :height: 150 px :width: 400 px :scale: 100 % :alt: chi-square When Mutual Information metric was used, the significance of dependency between two positions was evaluated using `Chebyshev's inequality `_. For example, if the observed mutual information is K * stdev times larger than that expected from random background model. P <= 1/K**2. Output examples ================= Users can download the FASTA file representing HNF6 DNA from `here `_. Or simply click the "Load FASTA Demo" button. Method = Mutual Information. P-value cutoff = 1 (Displays all links without filtering). .. image:: _static/HNF6_1.png :height: 600 px :width: 600 px :scale: 80 % :alt: HNF6 all Method = Mutual Information. P-value cutoff = 1 (Displays all links without filtering). Fucus on node 33 (Using the **focus on node** drop-down list to select 33). .. image:: _static/HNF6_1_33.png :height: 600 px :width: 600 px :scale: 80 % :alt: HNF6 motif 18th position centered. Method = Mutual Information. P-value cutoff = 1 (Displays all links without filtering). Fucus on node 5 (Using the **focus on node** drop-down list to select 5). .. image:: _static/HNF6_1_05.png :height: 600 px :width: 600 px :scale: 80 % :alt: HNF6 motif 33th position centered. Method = Mutual Information. **P-value cutoff = 0.01** (CircularLogo automatically filter out links with p-value > 0.01). .. image:: _static/HNF6_p0.01.png :height: 600 px :width: 600 px :scale: 80 % :alt: alternate text Method = Mutual Information. **P-value cutoff = 0.01** (Filter out links with p-value > 0.01). Manually filter out other weak links and focus on node 33 .. image:: _static/HNF6_p0.01_33.png :height: 600 px :width: 600 px :scale: 80 % :alt: alternate text LICENSE ========== CircularLogo is distributed under `GNU General Public License (GPLv2) `_ This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Contact ==================================== * Zhenqing Ye: Ye.Zhenqing@mayo.edu * Liguo Wang: Wang.Liguo@mayo.edu