Variant Calling In Samtools And Freebayes
2
1
Entering edit mode
10.1 years ago
sorrymouse ▴ 120

I've used both on the same input files, with essentially default parameters. I am using multiple samples that are supposedly from two populations.

freebayes --fasta-reference reference.fa reads.bam > out.vcf

samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | bcftools view var.raw.bcf > var.flt.vcf

Samtools identifies snps that freebayes does not, and its very strange. For example, one very strong snp (quality of 999, looks good in tview in all samples) does not show up in freebayes. What is weird is that it does show up if you only process one population into vcd, the population that is almost entirely homozygous reference. The second population, which is fixed homozygous alternate, if included in the analysis means the SNP doesn't show up. Which is really weird.

I have no idea what is different about these snp calling methods such that something like this would happen, it is not a snp that there is much doubt about. Does anyone know?

samtools • 7.6k views
ADD COMMENT
3
Entering edit mode
10.1 years ago
User 59 13k

First of all if you want help to assess the call posting the mpileup at that position might help along with the relevant lines from the VCF's

However it is a well-known fact that SNP and indel callers are far more discordant than you would first expect:

http://genomemedicine.com/content/5/3/28/abstract

You may also like this post from Brad Chapman on the topic:

http://bcbio.wordpress.com/2013/05/06/framework-for-evaluating-variant-detection-methods-comparison-of-aligners-and-callers/

ADD COMMENT
0
Entering edit mode

Hi Daniel,

I am just wondering you use freebayes to call variants from RNAseq data, if so, which aligner you found is the best companion for it? Thanks,

ADD REPLY
0
Entering edit mode
10.1 years ago
sorrymouse ▴ 120

Here is the variant I wrote about: chr . A C 999 . . GT:PL:GQ 1/1:255,81,0:78 1/1:255,148,0:99 ./.:0,0,0:3 1/1:255,90,0:87 ./.:0,0,0:3 1/1:119,24,0:21 1/1:255,181,0:99 0/1:255,0,255:99 ./.:0,0,0:3 0/0:0,66,255:63 0/0:0,66,255:63 0/0:0,42,255:39 0/0:0,196,255:99 0/0:0,42,232:39

And here is a similar variant with a much lower quality that freebayes always calls: chr . A C 7.15 . . GT:PL:GQ 0/0:0,187,255:99 0/0:0,250,255:99 0/0:0,96,255:99 0/0:0,99,255:99 0/0:0,0,0:12 0/0:0,126,255:99 0/0:0,90,255:99 0/0:0,255,255:99 0/0:0,0,0:12 0/0:0,163,255:99 0/0:0,151,255:99 0/0:0,36,255:48 0/0:0,255,255:99 0/1:47,0,149:35

I'm aware that there are discrepancies among snp callers, I thought there might be answer here related to how freebayes deals with its assumptions about population structures and haplotypes or something.

ADD COMMENT

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6