I aligned Illumina reads of a haploid fungal genome against a reference sequence and then called SNPs with Freebayes:
/home/ubuntu/freebayes/bin/freebayes -f reference.fasta alignment.bam --ploidy=1 --min-alternate-qsum 30 -F 0.05 > snps.vcf
I then followed this tutorial to filter the output of freebayes further using these commands with vcffilter:
- -f “SRP > 20” (Strand balance probability for the reference allele)
- -f “SAP > 20” (Strand balance probability for the alternate allele)
- -f “EPP > 20” (End Placement Probability)
- -f “QUAL > 30” (phred scaled variant quality)
- -f “DP > 100” (depth)
This reduced the SNP count from about 1,700 to about 15.
My questions are as follows:
- Is there a set of filtering techniques you start with that you then tweak for different applications?
- This filtering strategy is obviously somewhat stringent, since the SNP calls were significantly reduced -- is this too aggressive?
Thanks for any advice!