Freebayes And Mpileup Filtering Query
Entering edit mode
10.9 years ago
robert • 0

I have performed SNP calling using Samtools Mpileup thenh filtered by D100 coverage and Freebayes using samtools BAQ -E samtools calculation before hand on tomato resequencing. I am aiming to identify SNPs of these two genomes against a reference and also filter out those that they have in common against the reference into a sep file (found how to do this part on this website posted elsewhere).

When I filter the SNPs from a 3.2gb freebayes file I have 750Mb and when I filter for indels I have another file 750Mb but what am I filtering out of the 1.7 gb lost or is this just extra columns removed? if I have take both SNPs and Indels, I thought that's all it found?

It appears as the two files filtered using the --keep-only-indels and --remove-indels in the vcf toolkit have the same contents which are mixed so may have to find alternative method of seperating SNPS and Indels?

Could someone also recommend what other filters to use on the SNPs file or if it is recommended to filter further? I was going to remove those SNPs from freebayes that have coverage above 100 like samtools does.

If people want the command lines I have used and VCF tools commands for reference I can post.


snp • 4.6k views
Entering edit mode

You might want to break your sentences up a bit, it's hard to understand what you've done and what you're asking. If I've understood correctly, you've called variants using SAMtools and Freebayes. What I'm less sure about is whether you've submitted one genome to SAMtools and the other to Freebayes or if you've done both using each variant caller. Please clarify. Give a chronology of all the steps you took, one for SAMtools, one for Freebayes. Clarify what your aim is.

Entering edit mode
10.8 years ago
Erik Garrison ★ 2.4k

I would suggest filtering on QUAL rather than just depth. In freebayes the QUAL is accounting for observation quality, depth, possible

If you are getting a large report with many low-QUAL artifacts that you don't want to investigate later, you can use vcffilter immediately on the command line after freebayes, as such:

freebayes -f ref.fa aln.bam | vcffilter -f "QUAL > 20" >out.vcf

This would keep any records with QUAL > 20.

Another solution is to set --pvar 0.1 or higher when calling. Note that keeping QUAL > 20 would be roughly equivalent to --pvar 0.99. There is no default filter, as I cannot make a consistent assumption about whether users want sensitivity or specificity to be maximized.


Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6