Question: Discrepancy between number of SNP calls using samtools/bcftools when --skip-indels flag used
0
gravatar for TrentGenomics
9 months ago by
TrentGenomics30 wrote:

Hi all,

I've used the following command to call SNPs between two species:

samtools mpileup -gf NFS_reference.fasta SFS_alignment_bowtie2.coordSorted.bam | bcftools call -Ou -mv | bcftools filter -s LowQual -e '%QUAL<20 || DP>100' > NFS_SFS_SNPs.var.flt.vcf

The above command called ~180,000 SNPs. When I added the --skip-indels to the command below and re-submitted the job, ~700,000 SNPs were called.

samtools mpileup --skip-indels -gf NFS_reference.fasta SFS_alignment_bowtie2.coordSorted.bam | bcftools call -Ou -mv | bcftools filter -s LowQual -e '%QUAL<20 || DP>100' > NFS_SFS_SNPs.var.flt.vcf

Does anyone have any idea as to why there is such a large difference between the number of SNPs that are called when I add the --skip-indels flag?

Thank you,

MB

snp software error • 332 views
ADD COMMENTlink written 9 months ago by TrentGenomics30
1

Samtools ignores SNPs adjacent to indels. Or, more specifically, if there is a complex event that cannot be represented as a pure substitution or pure indel - such as GAT -> GC - Samtools will miscall it as GA -> G. I will guess that --skip-indels would unmask these hidden substitutions, though I would not have expected that to make such a big difference.

If I were you, I would look at the locations of the different calls in IGV to determine what is going on, and which version is correct (if either). I'd also suggest trying a different variant-caller to get a second opinion.

ADD REPLYlink written 9 months ago by Brian Bushnell15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1050 users visited in the last hour