bcftools variant calling
1
0
Entering edit mode
3.2 years ago

Hi, I am using bcftools to perform variant calling analysis on my .bam files. I wanted to understand what exact statistical model dose bcftools use for variant calling. I went through bcftools documentation http://samtools.github.io/bcftools/bcftools.html but other than this, are there any other resource which can help me understand? Thank you in advance.

genome Assembly SNP alignment • 1.2k views
ADD COMMENT
0
Entering edit mode

I highly appreciate any suggestion at this point!

ADD REPLY
0
Entering edit mode

you could use pipelines such as varsan or strelka

ADD REPLY
0
Entering edit mode

Hi, I am trying to understand how variant is called and reported with bcftools? What specific statistical model it used?

ADD REPLY
3
Entering edit mode
3.2 years ago
Dave Carlson ★ 1.7k

You probably want to check out Li 2011:

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

Abstract

Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty.

Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors.

ADD COMMENT
0
Entering edit mode

Thank you so much! This is a good paper but I am still not clear enough with the method. Is there any better source?

ADD REPLY
1
Entering edit mode

I don't know of a better paper, I'm afraid. I believe that Li 2011 describes the algorithms used by samtools/bcftools for calculating genotype likelihoods and calling variants. There is also a --multiallelic calling model implemented in more recent versions of bcftools, which is briefly described here.

ADD REPLY
0
Entering edit mode

Hi, I am still not clear with what statistics bcftools used for variant calling.

ADD REPLY

Login before adding your answer.

Traffic: 1967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6