Using VEP to get gnomAD frequencies
2
1
Entering edit mode
2.8 years ago

Hi all,

I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases.

For example, when I pass in:

10  69408929    COSM3751912 A   T   .   .   GENE=TACR2;STRAND=-;CDS=c.734T>A;AA=p.M245K


I get the VEP output:

COSM3751912 10:69408929 T   ENSG00000075073 ENST00000373306 Transcript  missense_variant    1278    734 245 M/K aTg/aAg rs55953810  MODERATE    -   -1  -   TACR2   HGNC    HGNC:11527  YES ENSP00000362403 P21452  -   UPI0000061EE3   -   3/5 -   Gene3D:1.20.1070.10,Pfam_domain:PF00001,PROSITE_profiles:PS50262,hmmpanther:PTHR43919,hmmpanther:PTHR43919:SF4,SMART_domains:SM01381,Superfamily_domains:SSF81321,Conserved_Domains:cd16004 1   1   1   1   1   0.9999  0.9999  0.9999  1   0.9999  1   0.9999  1   1   1   gnomAD_ASJ,gnomAD_FIN,gnomAD_OTH,gnomAD_SAS,AFR,AMR,EAS,EUR,SAS -   -   -   -   -   -   -


Jumbling through that, you can see the allele frequencies for gnomAD_AF is 0.9999. This seems odd to me. How could this variant be a COSMIC (cancer database) missense variant, with MODERATE consequence, and have 99.99% frequency. I'm lost on how to interpret this.

Maybe I am misunderstanding how gnomAD scores allele frequencies, hence posting this question here.

Does anyone know how gnomAD allele frequencies (as outputted by Ensembl's VEP) should be interpreted?

annotation frequency allele frequency vep gnomAD • 2.5k views
0
Entering edit mode

For some reason, it's giving you the 1-AF and not the AF. The variant at the chromosome level is an A>T with AF < 0.01, but at the transcript level becomes a T>A. Maybe that's the reason the AF is being inverted?

0
Entering edit mode

Interesting. Could that be because the strand is -1?

0
Entering edit mode

Answered my own question by looking at more data. Nope, there are plenty of +1 strand entries with the same frequency issue as the first example.

0
Entering edit mode

0
Entering edit mode

I did notice that as well. So why would VEP be returning anything then? Seems odd (again). There are other entries with a - in place of the AF (I'm assuming that means no data).

2
Entering edit mode
2.8 years ago

The reference allele, in this case, appears to be a very rare allele. The reference allele is whatever was found in the reference genome, which is a genomic region of a real person, which means that it can be a rare or private allele. In this case, the reference, A, is very rare.

The VEP is giving you the allele frequency of the alternative allele for the variant at this locus, which is rs55953810. The alternative allele given in your VEP input is T, so the allele frequency it gives you is for T.

VEP reads VCF format as standard, assuming that the alleles are the forward strand alleles. Your specification of strand in the INFO column is ignored because this is not the standard way to write VCF. This is why your alleles have not been converted.

It appears that you have run your VEP input against GRCh38. gnomAD coordinates are on GRCh37, and are remapped onto GRCh38, so can be looked up by the VEP to find the allele frequencies. This is why there is no variant at that locus in gnomAD, but that variant identifier and its frequencies do exist.

0
Entering edit mode

This is a big help in understanding whats going on. Thank you. Would you recommend converting the VCF to have (+) strand alleles only? As a side note, this VCF is directly downloaded from cosmic, so I suppose it makes sense that the "ref" alleles might be super rare.

0
Entering edit mode

Yes, I think that standard convention states that VCF should have only forward strand alleles. Perhaps this is something to raise with COSMIC if they are putting reverse strand alleles in their VCFs.

0
Entering edit mode
11 weeks ago
Kalin • 0

I created a python package based on SQLite databases, where you can easily query all gnomAD variants for GRCh37/38, without having to install VEP. https://github.com/KalinNonchev/gnomAD_MAF I have precomputed SQLite databases for gnomAD WGS for GRCh37/38 in the description of the package. Please take a look there.

Best,