Why does the CADD database have multiple lines for the same mutation/substitution with different gene IDs?
1
0
Entering edit mode
7 weeks ago
4galaxy77 2.2k

I grepped out a position 12_111803962_G_A from the CADD database and it returned this

12_111803962_G_A        32      Intergenic      DOWNSTREAM      ENSG00000274697 ENST00000617899
12_111803962_G_A        32      CodingTranscript        NON_SYNONYMOUS  ENSG00000111275 ENST00000261733


I'm confused as to why there are multiple lines for this specific mutation.

If I look it up on NBCI, then it says the mutation is in the ALDH2 gene, as expected. This maps to ENSG00000111275 which is the second entry in my grep results above. However, the first entry maps to ENSG00000274697 which is a different gene, MIR6761, which I believe is next door to ALDH2.

This seems very confusing to me - the position 12_111803962 isn't in the MIR6761 gene, so why does it map to there?

2
Entering edit mode
7 weeks ago
tomas4482 ▴ 300

This variant is a downstream variant of ENSG00000274697, annotated by Ensembl-VEP.