Question: Using VEP to get gnomAD frequencies
0
gravatar for brett.spurrier
4 months ago by
brett.spurrier20 wrote:

Hi all,

I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases.

For example, when I pass in:

10  69408929    COSM3751912 A   T   .   .   GENE=TACR2;STRAND=-;CDS=c.734T>A;AA=p.M245K

I get the VEP output:

COSM3751912 10:69408929 T   ENSG00000075073 ENST00000373306 Transcript  missense_variant    1278    734 245 M/K aTg/aAg rs55953810  MODERATE    -   -1  -   TACR2   HGNC    HGNC:11527  YES ENSP00000362403 P21452  -   UPI0000061EE3   -   3/5 -   Gene3D:1.20.1070.10,Pfam_domain:PF00001,PROSITE_profiles:PS50262,hmmpanther:PTHR43919,hmmpanther:PTHR43919:SF4,SMART_domains:SM01381,Superfamily_domains:SSF81321,Conserved_Domains:cd16004 1   1   1   1   1   0.9999  0.9999  0.9999  1   0.9999  1   0.9999  1   1   1   gnomAD_ASJ,gnomAD_FIN,gnomAD_OTH,gnomAD_SAS,AFR,AMR,EAS,EUR,SAS -   -   -   -   -   -   -

Jumbling through that, you can see the allele frequencies for gnomAD_AF is 0.9999. This seems odd to me. How could this variant be a COSMIC (cancer database) missense variant, with MODERATE consequence, and have 99.99% frequency. I'm lost on how to interpret this.

Maybe I am misunderstanding how gnomAD scores allele frequencies, hence posting this question here.

Does anyone know how gnomAD allele frequencies (as outputted by Ensembl's VEP) should be interpreted?

ADD COMMENTlink modified 4 months ago by Emily_Ensembl18k • written 4 months ago by brett.spurrier20

For some reason, it's giving you the 1-AF and not the AF. The variant at the chromosome level is an A>T with AF < 0.01, but at the transcript level becomes a T>A. Maybe that's the reason the AF is being inverted?

ADD REPLYlink written 4 months ago by RamRS21k

Interesting. Could that be because the strand is -1?

ADD REPLYlink written 4 months ago by brett.spurrier20

Answered my own question by looking at more data. Nope, there are plenty of +1 strand entries with the same frequency issue as the first example.

ADD REPLYlink written 4 months ago by brett.spurrier20

there is no variant in gnomad at this position: http://gnomad.broadinstitute.org/variant/10-69408929-A-T

ADD REPLYlink written 4 months ago by Pierre Lindenbaum120k

I did notice that as well. So why would VEP be returning anything then? Seems odd (again). There are other entries with a - in place of the AF (I'm assuming that means no data).

ADD REPLYlink written 4 months ago by brett.spurrier20
1
gravatar for Emily_Ensembl
4 months ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

The reference allele, in this case, appears to be a very rare allele. The reference allele is whatever was found in the reference genome, which is a genomic region of a real person, which means that it can be a rare or private allele. In this case, the reference, A, is very rare.

The VEP is giving you the allele frequency of the alternative allele for the variant at this locus, which is rs55953810. The alternative allele given in your VEP input is T, so the allele frequency it gives you is for T.

VEP reads VCF format as standard, assuming that the alleles are the forward strand alleles. Your specification of strand in the INFO column is ignored because this is not the standard way to write VCF. This is why your alleles have not been converted.

It appears that you have run your VEP input against GRCh38. gnomAD coordinates are on GRCh37, and are remapped onto GRCh38, so can be looked up by the VEP to find the allele frequencies. This is why there is no variant at that locus in gnomAD, but that variant identifier and its frequencies do exist.

ADD COMMENTlink written 4 months ago by Emily_Ensembl18k

This is a big help in understanding whats going on. Thank you. Would you recommend converting the VCF to have (+) strand alleles only? As a side note, this VCF is directly downloaded from cosmic, so I suppose it makes sense that the "ref" alleles might be super rare.

ADD REPLYlink written 4 months ago by brett.spurrier20

Yes, I think that standard convention states that VCF should have only forward strand alleles. Perhaps this is something to raise with COSMIC if they are putting reverse strand alleles in their VCFs.

ADD REPLYlink written 4 months ago by Emily_Ensembl18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1442 users visited in the last hour