Samtools Vcf Output
2
0
Entering edit mode
11.2 years ago
Kssr ▴ 110

What does X in the alt column stand for when I use,

-A keep all possible alternate alleles at variant sites, in the bcftools view command for calling snp's.

 chr2 642 . C T,X 85.6 . DP=306;VDB=0.0002;AF1=0.02819;AC1=1;DP4=144,144,3,2;MQ=37;FQ=85.7;PV4=1,0.19,1,1 
GT:PL:DP:SP:GQ  0/0:0,36,255,36,255,255:12:0:48  0/0:0,6,72,6,72,72:2:0:18  0/0:0,72,255,72,255,255:24:0:84  
**0/1:121,0,129,136,144,255:10:0:99**  0/0:0,57,255,57,255,255:19:0:69  0/0:0,48,255,48,255,255:16:0:60 0/0:0,54,255,54,255,255:18:0:66    0/0:0,21,187,21,187,187:7:0:33 0/0:0,66,255,66,255,255:22:0:78  0/0:0,30,234,30,234,234:10:0:42  0/0:0,51,255,51,255,255:17:0:63  0/0:0,36,246,36,246,246:12:0:48    0/0:0,69,255,69,255,255:23:0:81  0/0:0,45,255,45,255,255:15:0:5

Also I see that in the DP4 values, there are only 5 reads supporting the alternate allele.When we apply depth filter, as far as my knowledge goes, we filter by total coverage i.e.DP.Looking at DP4, how can this be called a good SNP (snp quality of 85.6)

variant samtools vcf • 5.9k views
ADD COMMENT
2
Entering edit mode
11.2 years ago

85.6 is not a great score. With multiple samples in one vcf, a really good SNP will have a score of 999.

ADD COMMENT
0
Entering edit mode

Thanks swbarnes.I have few more questions.I am actually looking into filtering snp's.

1.-d and D, this filters snps on read depth.We would be more interested in reads supporting the non-ref allele(last two values of DP4).I don't get the use of total read depth.

2.I filtered the reads by choosing a MAPQ of 20 in samtools mpileup step.In that case, I am expecting a RMS MQ in VCF >=20.I have seen some papers filter SNP's again by RMS MQ <25.Any idea why this is done?

I feel that these questions are relevant to my previous question.I will be happy to start a new post with these questions.Any help is appreciated.

ADD REPLY
1
Entering edit mode
11.2 years ago

I am not sure what does it mean. But here is the reference:

http://samtools.sourceforge.net/mpileup.shtml

In the BCF generated by SAMtools, an non-ref base 'X' represents an base has not been seen from the alignment data. Such a base is necessary to evaluate the probability of missing a non-ref allele due to sampling fluctuation.

ADD COMMENT

Login before adding your answer.

Traffic: 3030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6