Question: Mauve Similarity plots - similar or not?
1
gravatar for rrcutler
23 months ago by
rrcutler100
United States
rrcutler100 wrote:

Hello all. I have been using Mauve to visualize genome draft assemblies compared against a reference genome. I have been reading the website on how to interpret the similarity plots that are displayed in the viewer. Specifically they say "The height of the similarity profile is calculated to be inversely proportional to the average alignment column entropy over a region of the alignment." - I understand this as a the greater the height, the more similar two sequences are (correct me if I'm wrong). However, when choosing the similarity ranges, I am having trouble understanding how to interpret this.

So it seems that when comparing similar sequences the similarity plots (purple) should be fairly high and full, like in this: mauve plot

When selecting to display both the similarity plot and range my plots look like this, Similarity range lines are dark purple:

mauve plot

Further more, I visualized two identical sequences in mauve and got this:

mauve plot

So how do I interpret what the similarity plot ranges mean? What I think is that the higher a similarity range line is, the less range of differences there are between the sequences. The spiky purple shading also indicates ranges, but it seems to be more exaggerated than the dark purple line.

Also in the mauve output, what are the .backbone and .bbcols files?

Many Thanks

genome assembly mauve • 1.1k views
ADD COMMENTlink modified 23 months ago by aaron.darling40 • written 23 months ago by rrcutler100
4
gravatar for aaron.darling
23 months ago by
aaron.darling40 wrote:

Starting with the 2.4 releases, Mauve is no longer shading the entire area under the similarity curve, but instead draws a bolded line for the median similarity value over a region. The 'ranges', which are creating the fattened jagged areas in lighter purple in the 2nd plot of your post are showing the range of similarity values in that region. This adds context to the median line, so that it's possible visually to see regions where there may be small pieces missing or changed that are below the resolution of the current zoom level. As a result the plot can become visually intense to the point I thought of dubbing the 2.4 release the 'visual assault' release. I am certain that Mauve is violating some fundamental laws and regulations of data visualization.

In your 3rd plot the two sequences are identical so the median similarity is 100%, and the range covers 100%-100%, so it all just appears as a single purple line.

The user guide contains a description of the backbone file, I won't recapitulate it here.

Hope that helps!

ADD COMMENTlink written 23 months ago by aaron.darling40

@Dr. Darling: Glad to have you on Biostars. Now we can get authoritative support/answers for Mauve!

ADD REPLYlink written 23 months ago by genomax49k

Thanks a lot for the clarification!

ADD REPLYlink written 22 months ago by rrcutler100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour