Question: Principled structural variation detection with assembled genomes or pacbio reads
gravatar for hbw
4.8 years ago by
United States
hbw70 wrote:

I want to compare my assembled genome with a reference for structural variation. Most SV methods map resequenced paired-end reads to the reference such as SVdetect, BreakDancer etc. However, my main source of long range information are pacbio reads. I have also done a hybrid assembly. I have two questions:

1. Is there a principled way and established pipeline to use pacbio reads for structural variation detection? I know raw PB reads will be problematic because of their high error rates. One approach is to map raw reads to region of interest, do a local assembly and polishing. Is there an established pipeline for this.

2. A different approach would be to use assembled genomes. I can compare the genomes with MUMmer. Is there a standard software people use for taking the MUMmer output and getting a list of structural variations? Is there a way to gain confidence in terms of what is misassembly vs. structural variation? I know Sibelia uses whole genome alignment but it is advertised for microorganisms. Would it work for more complex genomes like plants?

ADD COMMENTlink modified 7 months ago by Manish10 • written 4.8 years ago by hbw70

Tools that uses PacBio to identify SVs doesn't require correction. That means they should handle the high error rates in the reads (If you know any tool that uses corrected reads let me know).

ADD REPLYlink written 3.0 years ago by Medhat8.7k
gravatar for ivminkin
4.5 years ago by
United States
ivminkin0 wrote:


Sibelia is useful for datasets < 500 MB. A version for larger genomes is in progress now, probably will be released in a year or sooner. You can take a look at Cactus as an alternative:

ADD COMMENTlink written 4.5 years ago by ivminkin0
gravatar for Manish
7 months ago by
Manish10 wrote:

I guess it is too late to answer the original question, but still will answer for anyone else who want to find variations from assemblies.

We developed a method, SyRI, to identify structural differences from whole-genome assemblies using alignments as inputs. It identifies syntenic (conserved) as well as structurally rearranged (inversions, transpositions, translocations, segmental (distal) duplication, tandem duplication) region. It also reports local variations (SNPs, indels, CNVs) within synteny and structural rearrangments providing hierarchy of variations.

More information is in paper and the method is on github.

It can work with large assemblies, but it does not differentiate between misassembly and structural variation.

ADD COMMENTlink written 7 months ago by Manish10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 810 users visited in the last hour