Recommendations for CNV calling algorithms/programs to benchmark
1
0
Entering edit mode
4.8 years ago
seraphya • 0

I want to benchmark the time/resource use and breakpoint accuracy of CNV callers on multiple individual NGS data at 30x-100x coverage at specific CNVs.

A few questions I have:

Is there a specific read mapper I should use to create the BAM file, or should I try a few for each SV/CNV caller?

Are there any specific algorithms/programs I should test? The only one I know I will test for sure is CNVnator as that is what has been used until now.

Anything else I should consider?

CNV Assembly • 4.0k views
0
Entering edit mode

Hi, I am doing very similar project - to compare algorithms to detect CNVs. For my best practice I am using BWA aligner. For detection CNV I have good results from - oncoCNV, CNVkit, Pindel.. I did not played with CNVnator - if you can share your experiences everybody would appreciate.

10
Entering edit mode
4.8 years ago
Garan ▴ 690

I'm guessing you're after germline CNV callers since you've mentioned CNVnator. I've included some suggestions below for read-depth based callers including ExomeDepth which is the one I've used the most (reasonably easy to use since it's an R package). I'd have a look at Ximmer if you're interested in comparing CNV callers since it provides a standardised framework for comparing callers out of the box.

I guess you could try Read-depth callers, callers that look for breakpoints, or split-read (although these are more for WGS than targeted / Exomes), callers that look for missing / moved mate-pairs, or read-pair. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4394692/ There's also assembly based callers and callers that use a combination of the above techniques. The approach you use will also depend on the library and the size of CNV you're looking for.

Generally I've only compared CNV callers after a pipeline that uses BWA for alignment and then GATK best practices, since a couple of the callers actually use parts of the GATK suite (like XHMM). Some CNV callers like CANVAS (https://github.com/Illumina/canvas) are optimised for their own workflow (in this case ISAAC).

XHMM http://atgu.mgh.harvard.edu/xhmm/tutorial.shtml http://www.cell.com/ajhg/fulltext/S0002-9297(12)00417-X XHMM used by ExAC to call their CNVs

ULYSSES https://github.com/gillet/ulysses Breakdancer https://github.com/genome/breakdancer

Frameworks

Ximmer https://github.com/ssadedin/ximmer https://www.biorxiv.org/content/early/2018/02/06/260927 Framework for running mulitple CNV callers together and calculating sensitivity etc. Comes with ExomeDepth, Xhmm, Cnmops and Conifer

GATK4 germline CNV caller https://software.broadinstitute.org/gatk/best-practices/workflow?id=11148 Not sure if this is available yet but should be ready soon - ideal if you want a full GATK best practice pipeline

Mainly used ExomeDepth on Targeted panels and found it okay with some tweaks and heavy filtering for false positives.

0
Entering edit mode

Can Breakdancer be applied on Paired End Whole Exome sequencing data ?

0
Entering edit mode

GATK4 germlineCNVcaller is available now. It'd be great to see how it stacks up against some of the older methods out there.