I was wandering if anyone might suggest me a good pipeline to perform SNP calling on haployd organisms.
I am currently using this one:
samtools mpileup -guf reference_genome.fa target_organism.sort.bam |
bcftools view -cg - |
vcfutils.pl varFilter –Q 20 - > result_vcf
The problem with this pipeline is that it treats a lot of errors as SNP. I find myself with many heterozigous SNPs that are caused by the presence of a read with an error (because there might be 8 reads with the sequence of the reference and 1 with the error).
If I filter out the heterozigous SNPs I risk losing information for the same reason (1 read with an error might cause the SNP to be considered heterozygous and therefore excluded).
I am considering the possibility of using the deepness of the SNPs to filter them, but my coverage is not so high and I would risk losing data concerning regions covered by just 2 or 3 reads.