Question

Comparing variant calls

0

Entering edit mode

12 months ago

eesiribloom ▴ 80

I want to evaluate how many variants from a high-confidence short read consensus callset are called by long-read callers (with ONT data).

At the minute I have tried BCFtools isec and bedtools jaccard and intersect with default parameters but these feel a bit primitive.

For tools such as these, what sort of parameters are recommended e.g. requiring reciprocal overlap or filtering based on MAF? especially given that this is comparing two different sequencing technologies, Im unsure how strict to be in terms of consensus between variant calls.

For parameters such as reciprocal overlap, would people recommend altering this based on variant sizes i.e. a multi-megabase/very large deletion may require a more stringent %overlap as it is "easier" for any variant to overlap with such a large deletion by chance.

Are there other tools or methods I could use? I'm struggling to find standard methods in the literature...

bedtools VCF bcftools • 670 views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 12 months ago by eesiribloom ▴ 80

0

Entering edit mode

Hey, not a direct answer, but I recommend you read this paper: Krusche, Peter, et al. "Best practices for benchmarking germline small-variant calls in human genomes." Nature biotechnology 37.5 (2019): 555-560.

ADD REPLY • link 12 months ago by MatthewP ★ 1.4k

score 0 · Answer 1 · 2023-04-20

The Real Time Genomics toolset has the snpeval tool and other related ones that seem to be the most commonly used in papers dealing with variant call comparisons:

https://github.com/RealTimeGenomics/rtg-tools

Manual:

https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual.pdf

Search for papers on evaluating variant calls - there are tons of these around; in my recollection, the process is surprisingly subjective.