Question

False negative SNPs and depth of coverage with sam/bcftools

0

Entering edit mode

9.2 years ago

marina.v.yurieva ▴ 570

I'm developing a variant calling pipeline for SNP detection in yeast and have 3 biological samples from yeast WGS which suppose to have 100% of SNPs in common. The raw calls have 80% SNPs overlap and the filtering on read quality, map quality, read depth, etc. make things only worse (~76-60% of common SNPs depending on the filter). When you look at the calls unique for each samples you can see that they are also present in the other samples but at the lower depth and haven't been called because of that. The sequencing depth of these samples is similar (x20, x20 and x19) but there are some regions where depth varies which causes a lot of false negatives.

My command line is:

samtools mpileup -d 8000 -Euf yeast.fasta sample1.bam  | bcftools call -vcO z -o sample1.vcf.gz

I haven't done much variant detection analysis and don't know if this a well-known problem, and google didn't show anything. Is there a way to get around this problem? How common is that and what do people do with this issue?

SNP bcftools samtools • 2.3k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by marina.v.yurieva ▴ 570