I asked this question at Seqanswers, but have not get any response yet.
I have a question upon calling a SNPs from RNA-seq data (Illumina signle-read, bacterial) with mpileup. I got my VCF file and I'm struggling a little bit to understand the output. If I have:
gi|xxx|emb|xxx| 143630 . C T 999 . DP=490 VDB=0.0004 AF1=1 AC1=4 DP4=1,0,211,276 MQ=20 FQ=-286 PV4=0.43,1.5e-11,1,0.048 GT:PL:GQ 1/1:255,255,0:99 1/1:255,255,0:99
I'm pretty sure of the following:
- 143630 is the position of my SNPs
- C is the base in the reference genome and T the alternate variant (actual SNPs)
- 999 is the score. The higher it is, better the chances that the call is genuine
- DP is the actual coverage on that specific position
- DP4 are reads fwd and rev for reference and fwd and rev for alternate call
- MQ is the quality
Now, here are my questions:
- VDB is supposed to be Variant Distance Bias. What exaclt does it means and how I interpret it?
- AF1 is Allele Frequency. By 1 it means that all the reads are calling the SNPs? If I have AF1=0.5, it means that half of the reads are calling ref nucleotide while the
other half is calling SNP?
- What the heck is AC1? Max likelihood okay, but how you interpret it?
- How do you interpret FQ (Phred probability), i.e. lower vs higher?
- PV4 is a total mess... Any insight would be greatly appreciated.
- GT:PL:GQ: same as above.
I know that this is probably very basic for most of you, but I'm just trying to make some sense out of it...
Thank you all in advance,