Understanding VCF abbreviations
4
1
Entering edit mode
6.0 years ago
aharnishi02 ▴ 80

My VCF contains the following:

DP=33;VDB=0.0951013;SGB=-0.691153;RPB=0.527181;MQB=0.00139564;MQSB=0.999702;BQB=0.328758;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=4,5,8,10;MQ=47

and

GT:NR:DP:SR:VR:VA:SB:ABQ:AMQ: 1/0:468.73038:23:65.22:65.22:.:24.93:33.53:220.13

Could you please help me get the expansions for these abbreviations??

next-gen alignment • 3.1k views
ADD COMMENT
3
Entering edit mode
6.0 years ago

Could you please help me get the expansions for these abbreviations??

they're defined in the VCF header.

e.g:

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
ADD COMMENT
1
Entering edit mode
6.0 years ago

Hello,

have a look at the header entries of your vcf file (those lines starting with ## at the top of the file). All fields are described there.

If you doesn't understand a description feel free to ask.

fin swimmer

ADD COMMENT
0
Entering edit mode

Hi,

Thank you for your reply, the problem I am facing is I used third party software to obtain the VCF ( since I dont have a bioinfomatics background), I found these abbreviations in the .csv output i obtained after running the VCF on wANNOVAR. So i am not reading the VCF file, but the .csv file

ADD REPLY
0
Entering edit mode

So i am not reading the VCF file, but the .csv file

it's like trying to re-create a cow from a steak

ADD REPLY
1
Entering edit mode

Haha, True!!.. but that's where I am!! Anyway, Thank you. :)

ADD REPLY
0
Entering edit mode
6.0 years ago
JJ ▴ 680

Have a look at this page - I think they neatly explain various entries.

ADD COMMENT
0
Entering edit mode

I wonder why you're pointing to the 1000Genome VCF v4.0 resource instead of the official VCF v4.2 or v4.3 documentation

ADD REPLY
0
Entering edit mode
17 months ago

I encountered such a VCF in a support thread, and I still have no idea what third party tool generates these. But the VCF header describes these fields.

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=SR,Number=1,Type=Float,Description="Supporting reads">
##FORMAT=<ID=VR,Number=1,Type=Float,Description="Variant reads">
##FORMAT=<ID=VA,Number=1,Type=Integer,Description="1=Variant ambiguous, Genotype at this location is not clear, otherwise VA=0">
##FORMAT=<ID=SB,Number=1,Type=Float,Description="Strand bias">
##FORMAT=<ID=ABQ,Number=1,Type=Float,Description="Average base quality">
##FORMAT=<ID=AMQ,Number=1,Type=Float,Description="Average mapping quality">

Also peculiar is that SR (Supporting reads) and VR (Variant reads) are not whole numbers and instead look like VAF, Variant Allele Fraction as a percentage value.

ADD COMMENT

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6