Weirdness in annotation (missing allele frequencies)
0
0
Entering edit mode
7 months ago

Hello everyone!

I am trying to get the gnomad allele frequencies of variants in the VCF files (I share the screenshot of one example which allele frequency I want to have). I used the following protocols and operations:

protocols="gnomad211_genome,gnomad211_exome,clinvar_20221231,dbnsfp42a,avsnp150,refGene" operations="f,f,f,f,f,g"

I have two sets of AF values, I believe one retrieved from gnomad211_genome database, one retrieved from gnomad211_exome database. Here are the columns I get after the annotation: (2 of the each one with the same name: 2 AF, 2 AF_popmax, 2 AF_male... etc.)

AF AF_popmax AF_male AF_female AF_raw AF_afr AF_sas AF_amr AF_eas AF_nfe AF_fin AF_asj AF_oth non_topmed_AF_popmax non_neuro_AF_popmax non_cancer_AF_popmax controls_AF_popmax

But any of these columns have the value that I want (which can be seen in the screenshot I share) How can I get gnomad allele frequencies of the variants?

Also, another weird thing is that some variants have an AF values that are too much low, when I check them on gnomad database, they don't exist. How can I get some frequency values if the variants are not in the gnomad database, should I trust these too low values?

Thank you all!

one example variant

the attributes of the variant with the Allele Frequency I want to have with annotation

Notice that the allele frequency is 0.2391

allele-frequency gnomad annovar • 1.3k views
ADD COMMENT
0
Entering edit mode

before we dive in too deep let's confirm we are on the same page that gnomad 2.1.1 (in your text) is in hg37. gnomad 3.1.2 (in your screenshot) in is hg38. annovar may be lifting the 2.1.1 over to hg38 but it won't match 3.1.2.

ADD REPLY
0
Entering edit mode

Thanks for pointing out. Let me explain:

I had VCF files with gnomad 2.1.1 variants (chromosome positions). I performed a liftover with Pickard liftover tool which means I converted the chromosome positions of the variants from hg19 to hg38. And with these new VCF files (containing hg38 chromosome positions) I performed annotation with ANNOVAR. But my supervisor suggested me to use specifically protocols="gnomad211_genome" and "gnomad211_exome" for the annotation because they are studied more and can include more individuals. Annotation went well, but I have this kind of variants with too little frequency values. In the screenshot I just gave an example, and the fact that it is hg38 or hg19 does not matter because I tried to look for these variants (the ones with too little frequency values) on gnomad, both for hg19 and hg38 chromosome positions and I could not find, they don't exist. My questions are briefly:

  1. how can I get the gnomad allele frequencies of these variants (all of the variants in general) since the annotation provided so many frequencies but not the one that I want
  2. what is the meaning of getting a very little frequency value after annotation if the variant does not exist in gnomad database, how is that possible?
ADD REPLY
1
Entering edit mode

You performed a liftover on your VCF and then used a liftover-gnomad211 for the annotation? Why not simply use the regular gnomad211 with your hg19 callset? I use the gnomad liftover because my calls are done using hg38 and gnomAD 3.x is not as robust as 2.x yet, but you have no reason to do a double liftover.

ADD REPLY
0
Entering edit mode

Thank you so much, I decided to move forward as you suggested. Initially, I thought using the most up-to-date databases and positions would provide a better quality annotation, but as you said it was not healthy to do the liftover and work on that.

ADD REPLY
1
Entering edit mode

There are plenty of variants that don't exist in gnomAD - only 759M SNPs (chr-pos-ref-alt) out of a possible 9B. Your question is a bit confusing to me because I don't know how you are getting any frequency values for variants that are not in gnomAD. They should just be NULL.

ADD REPLY
0
Entering edit mode

Yes, I was confused as well. I think the same. Maybe Annovar uses some other resources besides gnomad, but it does not make sense. Also, I get these columns with the annotation I performed (I provide an example). Can you tell me what they stand for:

Otherinfo1  Otherinfo2  Otherinfo3  Otherinfo4  Otherinfo5  Otherinfo6  Otherinfo7  Otherinfo8  Otherinfo9  Otherinfo10 Otherinfo11 Otherinfo12 Otherinfo13
0.5 338.165 163 chr1    155340778   .   GA  G   338.165 PASS    AF=0.60625;AO=86;DP=163;FAO=97;FDP=160;FR=.;FRO=63;FSAF=46;FSAR=51;FSRF=24;FSRR=39;FWDB=-0.264896;FXX=0.0184038;HRUN=8;LEN=1;MLLD=13.7233;OALT=-;OID=.;OMAPALT=G;OPOS=155340779;OREF=A;PB=0.5;PBP=1;QD=8.45413;RBI=0.277578;REFB=-0.000902674;REVB=0.0829417;RO=56;SAF=38;SAR=48;SRF=18;SRR=38;SSEN=0;SSEP=0;SSSB=0.0899341;STB=0.536957;STBP=0.248;TYPE=del;VARB=-0.00188567   GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT    0/1:162:163:160:56:63:86:97:0.60625:48:38:18:38:51:46:24:39:1

I am asking this, because it seems that these columns represent classical annotation columns Chr Start End Ref Alt ... But they appear in the end of the annotation as an addition. Chr Start positions differ from the first column.

Here are the protocols I used:

protocols="gnomad211_genome,gnomad211_exome,clinvar_20221231,dbnsfp42a,avsnp150,refGene"

Thank you.

ADD REPLY
0
Entering edit mode

0.60625 is 97/160 so if you have 80 individuals in your VCF that's an internal allele frequency, not gnomAD

ADD REPLY

Login before adding your answer.

Traffic: 1775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6