Question

Why does the CADD database have multiple lines for the same mutation/substitution with different gene IDs?

0

Entering edit mode

22 months ago

4galaxy77 2.8k

I grepped out a position 12_111803962_G_A from the CADD database and it returned this

12_111803962_G_A        32      Intergenic      DOWNSTREAM      ENSG00000274697 ENST00000617899
12_111803962_G_A        32      CodingTranscript        NON_SYNONYMOUS  ENSG00000111275 ENST00000261733

I'm confused as to why there are multiple lines for this specific mutation.

If I look it up on NBCI, then it says the mutation is in the ALDH2 gene, as expected. This maps to ENSG00000111275 which is the second entry in my grep results above. However, the first entry maps to ENSG00000274697 which is a different gene, MIR6761, which I believe is next door to ALDH2.

This seems very confusing to me - the position 12_111803962 isn't in the MIR6761 gene, so why does it map to there?

cadd • 480 views

ADD COMMENT • link updated 22 months ago by tomas4482 ▴ 390 • written 22 months ago by 4galaxy77 2.8k

score 2 · Accepted Answer · 2022-06-28

2

Entering edit mode

22 months ago

tomas4482 ▴ 390

This variant is a downstream variant of ENSG00000274697, annotated by Ensembl-VEP.

ADD COMMENT • link 22 months ago by tomas4482 ▴ 390