Question

How To Interpret Fusion Genes In Hgvs Format From Cosmic Database?

5

Entering edit mode

10.9 years ago

Hmm ▴ 500

For example what does this mean exactly:

SRGAP3{NM_014850.1}:r.1_1538_SRGAP3{NM_014850.1}:r.1511_1538_RAF1{NM_002880}:r.964_2977

or

EML4{ENST00000318522}:r.1_1943_ALK{NM_004304}:r.4151_6222

There is some documentation @ the following website but still i donot get what the fusion genes exactly mean. Do they give the exact breakpoint? i can understand r.1 means the first exon but what is 1538 or 1943????

References where some info can be found: http://www.hgvs.org/mutnomen/recs-DNA.html and COSMIC help page Thanks

fusion genes • 4.9k views

ADD COMMENT • link updated 8.1 years ago by Jessica L • 0 • written 10.9 years ago by Hmm ▴ 500

0

Entering edit mode

This is very old post, with enough number votes. My understanding of HGVS is that: 1) SRGAP3{NM_014850.1}:r.1_1538_SRGAP3{NM_014850.1}:r.1511_1538_RAF1{NM_002880}:r.964_2977 - denotes a fusion transcript that involves first 1538 bases of transcript NM_014850.1, a small rna piece from the same transcript (NM_014850.1) spanning bases between 1511 and 1538, and partial transcript from another transcript NM_002880 starting from 964 position and ends with 2977 position on NM_002880 transcript

2) EML4{ENST00000318522}:r.1_1943_ALK{NM_004304}:r.4151_6222 is fusion transcript between EML and ALK genes and the transcripts involved are ENST00000318522 and NM_004304. First 1943 bases of transcript ENST00000318522 (of EML4 gene) fused with the partial transcript NM_004304 from bases 4151 and 6222. I am not sure why two different databases are used in representing transcripts (in one case it is ensembl and in another case it is NCBI)

ADD REPLY • link 6.6 years ago by cpad0112 21k

score 0 · Answer 1 · 2016-03-10

I realize this is an old post but my current project is requiring me to pay closer attention to RNA fusion nomenclature so I've been running into the same problem as the OP (I actually came across this post while googling HGVS r. notation. Here's what I've been able to learn (corrections would be appreciated):

The second example is more straightforward than the first: Bases 1 through 1943 from EML4 (Ensembl ID ENST00000318522) are joined to bases 4151 through 6222 from ALK (genbank accession NM_004304).

The catch with r. notation is that it doesn't always tell you which exon number(s) is/are included in the fusion events so you have to look up additional information-- like whether base 1943 is the end of EML4 exon 1 or exon 2, if you care about that kind of information (which I do).

The first example appears to be something like Bases 1 through 1538 of SRGAP 3 are joined to bases 1511 through 1538 (a small 27bp duplication, maybe?) and that is joined to bases 964 through 2977 of RAF1.