How To Interpret -D File In Samtools Variant Calling Pipeline?
1
1
Entering edit mode
13.1 years ago
Ian 6.0k

I have been using using the samtools pipeline for variant calling for a while now and have just moved over to using mpileup. I am using the suggested method:

  1. samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
  2. bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

I am aware at the vcfutils step that '-d' specifies the minimum read depth and '-D' specifies the maximum read depth. However, what is not clear to me is, for example, if -D = 100 and the coverage of a SNP is 200, what happens? Do 100 random reads get used, or more worryingly is that SNP not reported?

Thanks.

samtools vcftools • 4.9k views
ADD COMMENT
5
Entering edit mode
13.1 years ago
Farhat ★ 2.9k

From the source code it appears that the SNP is not reported if the coverage exceeds the maximum depth.

ADD COMMENT
2
Entering edit mode

Farhat gives the correct answer.

ADD REPLY
0
Entering edit mode

I have reopened this questions as it is crucial whether a SNP uses sampled reads or is skipped if coverage is above -D. Thanks for your effort Farhat, but 'appears' is not definite enough.

ADD REPLY
0
Entering edit mode

Thanks for the confirmation, Li. I had only taken a cursory look so didn't want to be too certain of the answer.

ADD REPLY
0
Entering edit mode

My understanding is that if unique or "reliable" (from FAQ of samtools ) sequences are used for snp calling, the -D option is meaningless, since the high coverage is not caused by duplication of genome segments.

ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6