Question

compare SV results across samples - Manta, Delly

2

Entering edit mode

6.0 years ago

Richard ▴ 590

Hi folks,

I have Delly and Manta results for a cohort of WGS tumour libraries. Some of the libraries are derived from the same sources using different methods in the lab and I want to be able to compare/contrast the sets of SV calls generated by the tools.

Is there any magic sauce out there that can take in Manta or Delly results from multiple libraries and create Venn diagram like results? I can imagine such a tool would split the candidate calls into subsets corresponding to the different types of SVs and then does either exact or approximate matching to determine if a variant is common to multiple files. Is there anything out there that will take care of this?

manta delly wgs illumina • 7.5k views

ADD COMMENT • link updated 2.6 years ago by dlekrud456 • 0 • written 6.0 years ago by Richard ▴ 590

score 6 · Answer 1 · 2018-05-09

6

Entering edit mode

6.0 years ago

WouterDeCoster 47k

A tool to find overlaps between structural variant calls is SURVIVOR. Creating a Venn diagram afterwards requires some lines of code, I'll add a python example below:

ADD COMMENT • link 6.0 years ago by WouterDeCoster 47k

1

Entering edit mode

Does it support the non-standard VCF notation that manta and delly use? Notably, both of these callers use their own (different!) custom fields for single inversion-like breakpoints (intra-chromosomal events in which the breakend orientation is the same on both sides).

ADD REPLY • link 6.0 years ago by d-cameron ★ 2.9k

1

Entering edit mode

Yes. I have used it multiple times for Delly, Manta and Lumpy.

ADD REPLY • link 5.8 years ago by fritz.sedlazeck ▴ 40

0

Entering edit mode

Hi, I got errors when I draw Venn diagrams from SURVIVOR merge output, would you be able to help with this issue? Thanks so much for your help in advance! https://github.com/fritzsedlazeck/SURVIVOR/issues/151#ref-issue-621436369

ADD REPLY • link 2.6 years ago by dlekrud456 • 0

score 3 · Answer 2 · 2018-05-09

3

Entering edit mode

6.0 years ago

Len Trigg ★ 1.6k

Comparing SVs across callers is certainly not an easy problem. There is active work in the GIAB consortium around deriving high quality SV call sets and developing tools for comparing call sets, so there are several options under current development. You will probably end up trying a few tools to see which ones best meet your particular needs. For example, are your calls primarily sequence-resolved, are they represented using high-level SV event types (DEL/DUP, etc) or as low-level break-ends, or a mixture of these.

As well as SURVIVOR (that Wouter already mentioned), you might look at truvari, SVanalyzer and our RTG Tools includes the svdecompose and bndeval commands to facilitate comparing SVs at the break-end level (creating outputs similar to what vcfeval does for small variants, if you are familiar with that)

ADD COMMENT • link 6.0 years ago by Len Trigg ★ 1.6k

0

Entering edit mode

Do any of the tools take repeat homology into account? I've had issues matching long read variant calls with short read ones for ME expansions.

ADD REPLY • link 6.0 years ago by d-cameron ★ 2.9k

0

Entering edit mode

The SVanalyzer SVcomp tool does take repeat homology into account since it constructs the resulting haplotypes and compares them. I haven't tried it myself yet, so am not sure whether it also works with non-sequence-resolved calls though.

ADD REPLY • link 6.0 years ago by Len Trigg ★ 1.6k

score 1 · Answer 3 · 2018-05-10

Is there any magic sauce out there that can take in Manta or Delly results from multiple libraries and create Venn diagram like results? I can imagine such a tool would split the candidate calls into subsets corresponding to the different types of SVs and then does either exact or approximate matching to determine if a variant is common to multiple files. Is there anything out there that will take care of this?

If you're comfortable in R, my StructuralVariantAnnotation tool will convert manta and delly (and many other callers) calls into a standardised breakpoint notation which you can then match in the usual BioConductor way using findBreakpointOverlaps(). It handles many of the complications that arise when matching SVs, including support for inexact calls (CIPOS/CIEND) and breakpoint homology.

It doesn't do the Venn diagrams but it will tell you which calls match with which and there's plenty of R plotting library available that you can use.

As a proof of concept, I've built a benchmarking Shiny app on top of this library.