3 months ago by
And what actually SV caller merging apps do...do that merge vcf file that we can also do by self.....enlighten me....
SV merging is non-trivial due to the notational and detection differences of the various detection tools. Even getting them in a standard format is a challenge in itself. E.g. BreakDancer, Socrates, HYDRA, and GRIDSS (my tool, I highly recommend it ;) report all events in VCF breakend notation. Other tools use the alternate SVTYPE=INS/DEL/INV/DUP notation, others report the REF and ALT base sequences directly. Determining that the BND pair of records from one caller, the DUP call for another, and the ALT sequence that is longer than the REF in the third caller are actually the same call is a non-trivial task. On top of this, CNV callers are fundamentally different in that they report (changes in) abundance of DNA segments instead of novel DNA sequence adjacencies that the breakpoint callers report. Add inexact calling and sequence homology on top of that and you have quite the task ahead.
I have an R package (https://github.com/PapenfussLab/StructuralVariantAnnotation) that addresses the matching of calls from breakpoint-based callers but it doesn't convert that into a consensus call set, nor does it handle CNV calls.
I need suggestions for better results in SVs detection.
Running multiple callers to ensure coverage of the range of SVs you're interested in is a good approach (e.g. a general purpose SV breakpoint caller, a specialised microsatellite caller, and a CNV caller). Generaying a consensus call set based on multiple callers of the same type (e.g. pindel+delly+lumpy+manta+gridss) does not necessarily give you better results. There is considerable overlap in FPs between callers using the same methods and in many cases, you're better off just using the results of the best-in-class caller.
As you only have WES: what classes of SVs are you hoping to detect?