Question: How to combine multiple tools to detct SVs in WES data
gravatar for vivekruhela
14 days ago by
vivekruhela10 wrote:


I want to use multiple tools (e.g. GATK, splitread etc) for detection of structural variation in WES data. Although I can use them individually but I want to use their combination for better result. But I don't know how to combine them. I need suggestions for better results in SVs detection.


snp sequence next-gen R • 161 views
ADD COMMENTlink modified 7 days ago • written 14 days ago by vivekruhela10

From my experience the best combination is pindel +CNVkit+ONCOcnv ;-)

ADD REPLYlink written 14 days ago by Korsocius80

@Korsocius : Thanks for reply. I was planning to use the combination GATK+Splitread+Sprites because I want lower false positive + good F-score + more novel SVs. May be I am wrong...can you suggest me why (pindel+CNVkit+ONCOcnv) is good. And what actually SV caller merging apps that merge vcf file that we can also do by self.....enlighten me....

ADD REPLYlink written 14 days ago by vivekruhela10
gravatar for vivekruhela
7 days ago by
vivekruhela10 wrote:

I am using CombineVariants for merging various vcf files obtained from gatk (haplotype), samtools and pindel. With this I can also extract their intersection i.e. variants which are common in all vcf files or use all of the variants found by all variant callers.

ADD COMMENTlink written 7 days ago by vivekruhela10

based on my reading of the documentation, it looks like CombineVariants is not SV-aware and will only work for SNVs and small indels. Variants calls using SVTYPE notation are likely to be incorrectly merged by that tool.

This may or may not be acceptable for your use case.

ADD REPLYlink modified 1 day ago • written 1 day ago by d-cameron1.2k

Sorry for late response. As I have checked the doccunentation of CombineVariants, nothing has mentioned about SV-aware or it is only for SNPs and INDELs. What's your experience says about this and I also would like to know its reason. I'm using genotype option PRIORITIZE to merge the vcf file of the same sample. What are possible errors by doing so? Thanks for your reply. Let me know the reasons ASAP.

ADD REPLYlink written 7 hours ago by vivekruhela10

It's easiest to show an example. The following variants are all just different representations of the same variant. If the tools doesn't explicitly handle all representation then it won't merge correctly and that's even before CIPOS has to be considered.

contig    3    ins_indel_representation1    A    ACTCAG    .    .    
contig    4    ins_indel_representation2    G    CTCAGG    .    .    
contig    3    ins_svtype_representation1    A    <INS>    .    .    SVTYPE=INS;SVLEN=5;END=3
contig    4    ins_svtype_representation2    G    <INS>    .    .    SVTYPE=INS;SVLEN=5;END=4
contig    3    ins_bnd_1    A    ACTCAG[contig:4[    .    .    SVTYPE=BND;PARID=ins_bnd_2;EVENT=example_ins
contig    4    ins_bnd_2    G    ]contig:3]CTCAGG    .    .    SVTYPE=BND;PARID=ins_bnd_1;EVENT=example_ins
ADD REPLYlink modified 7 hours ago • written 7 hours ago by d-cameron1.2k

There does not exist any tool that performs the haplotype sequence reconstruction required to correctly combine SV variants in all cases.

ADD REPLYlink written 6 hours ago by d-cameron1.2k
gravatar for d-cameron
13 days ago by
d-cameron1.2k wrote:

And what actually SV caller merging apps that merge vcf file that we can also do by self.....enlighten me....

SV merging is non-trivial due to the notational and detection differences of the various detection tools. Even getting them in a standard format is a challenge in itself. E.g. BreakDancer, Socrates, HYDRA, and GRIDSS (my tool, I highly recommend it ;) report all events in VCF breakend notation. Other tools use the alternate SVTYPE=INS/DEL/INV/DUP notation, others report the REF and ALT base sequences directly. Determining that the BND pair of records from one caller, the DUP call for another, and the ALT sequence that is longer than the REF in the third caller are actually the same call is a non-trivial task. On top of this, CNV callers are fundamentally different in that they report (changes in) abundance of DNA segments instead of novel DNA sequence adjacencies that the breakpoint callers report. Add inexact calling and sequence homology on top of that and you have quite the task ahead.

I have an R package ( that addresses the matching of calls from breakpoint-based callers but it doesn't convert that into a consensus call set, nor does it handle CNV calls.

I need suggestions for better results in SVs detection.

Running multiple callers to ensure coverage of the range of SVs you're interested in is a good approach (e.g. a general purpose SV breakpoint caller, a specialised microsatellite caller, and a CNV caller). Generaying a consensus call set based on multiple callers of the same type (e.g. pindel+delly+lumpy+manta+gridss) does not necessarily give you better results. There is considerable overlap in FPs between callers using the same methods and in many cases, you're better off just using the results of the best-in-class caller.

As you only have WES: what classes of SVs are you hoping to detect?

ADD COMMENTlink written 13 days ago by d-cameron1.2k
gravatar for Rohit
14 days ago by
European union
Rohit1.3k wrote:

SV-Merge and MetaSV already perform merging with illumina-paired end data. If you have long-reads, then give NextSV a try.

ADD COMMENTlink written 14 days ago by Rohit1.3k

@Rohit: Mean coverage is around 100x and read length is 75bp. Is NextSV good for my data. Rest MetaSV is a python package and I am working in R. Is there any package in bioconductor or in R. Thanks.

ADD REPLYlink written 14 days ago by vivekruhela10

NextSV is based on long-reads, I don't think you can apply it your data. IntanSV seems good, never tested it though.

ADD REPLYlink written 13 days ago by Rohit1.3k

If you're wanting a standardised format to compare and annotate SVs in R, my StructuralVariantAnnotation package works a wider range of callers as well as any VCF file correctly following the standard, but doesn't actually do the merging (this is non-trivial since SVs matches are not necessarily transitive).

ADD REPLYlink written 6 hours ago by d-cameron1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1698 users visited in the last hour