Question: Approaches for SV calling from De Novo assembly
3
gravatar for novice
3.1 years ago by
novice890
United States
novice890 wrote:

Given a de novo assembly and a reference assembly, what methods have you tried / would you recommend for determining structural variations? 
 

 

assembly sv • 1.2k views
ADD COMMENTlink modified 2.9 years ago by QVINTVS_FABIVS_MAXIMVS2.2k • written 3.1 years ago by novice890
6
gravatar for QVINTVS_FABIVS_MAXIMVS
2.9 years ago by
USA SoCal
QVINTVS_FABIVS_MAXIMVS2.2k wrote:

For this paper An integrated map of structural variation in 2,504 human genomes my tiny part was to validate complex structural variation in long read TruSeq data.

It was about 3 years ago so I'm not sure if there are better approaches. But we created breakpoint contigs across putative SV breakpoints using Velvet.

Then I took the breakpoint contigs (there are many possible ones generated by Velvet) and I used BLAT to align them to the reference genome.

Using the BLAT results I was able to parse out the precise breakpoints.

Like I said it's rather labor intensive and I'm sure there's a better way of doing it. But this might be a good lead!

ADD COMMENTlink written 2.9 years ago by QVINTVS_FABIVS_MAXIMVS2.2k
3

Thank you for the answer. I really appreciate the detailed supplementary paper accompanying your paper. However, it doesn't seem to go into how BLAT was used. Could you please explain how you inferred breakpoints from the BLAT alignment?

It's been a long time and I have already used a combination of different methods for my purpose (similar to those of your collaborators), but I'm definitely interested in learning your method.

ADD REPLYlink written 2.8 years ago by novice890
3

I attached this visual aid. BLAT alignments

For the contig you generated across a breakpoint, you align it to the reference genome and seek alignments with high percent identity. You expect the sequence to match nearly to 100%.

In this example the deletion on the right is evident since the break point contig aligns with two noncontinuous parts. The number of base pairs between the last aligned base pairs for each aligned segment is the size of the deletion.

Using command line BLAT (download from UCSC genome browser under Tools) will give you output that makes parsing alignments easy.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by QVINTVS_FABIVS_MAXIMVS2.2k
3

Brilliant. Thank you for the explanation.

Quick question: could you use BLAST instead of BLAT? I'm wondering if there's a specific reason you choose BLAT.

ADD REPLYlink written 2.8 years ago by novice890
2

I think BLAT works better for short sequences? Also you can download a command line version of BLAT. My PI uses it for primers, but I don't see why no BLAST.

I think BLAT is faster too, no?

Also BLAT output from the command line has the number of aligned segments. Anything equal to 1 is not a SV. Greater than 2 indicates a complex SV (DUP-INV-DUP, DEL-DUP, etc.) I found it really informative after writing a script able to parse the BLAT output (assuming you have a lot of breakpoints to test)

ADD REPLYlink modified 6 months ago by RamRS20k • written 2.8 years ago by QVINTVS_FABIVS_MAXIMVS2.2k

For my work (yeast) BLAST is actually much faster for some reason. But I didn't know that about alignment segments. When I tried BLAT out, I just formatted the output like BLAST (-out=blast8). Interesting!

ADD REPLYlink written 2.8 years ago by novice890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 986 users visited in the last hour