Sequence Alignment Visualization
8.2 years ago
I have five fasta files and I need to visualize the how a particular segment is shared by the sequences. It is similar to multiple sequence alignment. I have a table which contains the anchor is, its length and the starting position in all/some sequences. The table looks like the following

Anchor     Length     S1     S2     S3     S4     S5
1              49            _       100    102    -        105
2              63           201     -        205   200    -
3              75           324    325      -        -        326


Here the first anchor is shared by Sequence2, Sequence 3 and Sequence 5

What tool I should use to visualize the above information?

8.2 years ago

If you are comparing aligned sequences within an anchor class, perhaps rotate your table 90 degrees and copy results from a per-anchor multiple alignment with NEEDLE (or a similar tool, depending on your parameters) and paste each line into the anchor-by-sequence cell. Then present that table. So long as your sequences are of equal length and you use a monospaced font, the aligned bases will line up.

If you need a graphical Circos-like figure, consider a five-spindle hive plot. Each of the five spindles is a range representing the minimum to maximum of the sequence start and end positions -- from 100 to 401 -- rescaled to a normalized range of 0 to 1.

Draw three colors of hive ribbons for each of the three classes ("anchors") that connect from one spindle/sequence to the other. The width of the ribbon is the (normalized) length parameter in your table, and a ribbon's start position along a spindle is set by the sequence start position.

For example, for Anchor 1, we use spindles 2, 3 and 5 to represent sequences S2, S3 and S5. The length of every spindle is 401 - 100 or 301 units. Ribbons for Anchor 1 are therefore of width (49 / 301) or 16% of the length of a spindle. We draw a ribbon from S2 to S3. Given a "unit" spindle or normalized length 1, spindle 2's ribbon starts at the 0 position and ends at the 0.16 position. It connects with spindle 3, and it starts along spindle 3 at a normalized position of (2 / 301) or 0.01, and ends at 0.01 + 0.16 = 0.17. Another "Anchor 1" ribbon connects S3 to S5 at [0.01, 0.17] on spindle 3, to [0.02, 0.18] on spindle 5.

This process is repeated for the other two anchors. Anchor 3 ribbons would mostly hug the outer range of the spindles, since they start at ((324 - 100) / 301) = 0.74 and work their way out very slightly.

You could use three colors to denote ribbons for each anchor-class. Further, because ribbons would overlap, you could apply a fractional opacity to the three ribbon colors. This allows the viewer to see connections that span intermediate spindles, which might overlap ribbons from other anchor-classes.

One tool I have used to make publication-quality hive plots is hiveR. There is also a d3-based version for web visualizations. This variant is probably a close rendition of your scenario, but with different widths, start and end positions for ribbons, and two more spindles:

Hi Alex,

Thanks! I am actually looking for a tool similar to http://circos.ca/ or a tool whose output is five horizontal line with edges connecting anchors.

Thanks again. The figures like the following would be little better.

Yeah, you can definitely make that with Circos!

I was trying that, but still struggling with the data table http://circos.ca/tutorials/lessons/configuration/data_files/

Could you please suggest what modifications need to be done? Also I was referring to the online circos table viewer, but could not understand what the table represents. http://mkweb.bcgsc.ca/tableviewer/

