Annotating gene fusion predictions
2
1
Entering edit mode
6.0 years ago

I've got a list of SV calls, including breakpoints, and can easily enough winnow them down to those that are candidate gene fusions (intersect gene body or intron, same direction, strand, etc).

Now I'd like to know whether they're predicted to give in-frame or out-of-frame fusions. So far, I'm unable to find any annotation tool that can do this in a straightforward manner.  Visualization would also be nice, but isn't necessary.

Any suggestions?

sv fusion gene structuralvariant annotation • 3.1k views
2
Entering edit mode
6.0 years ago

I had good results using PRADA, (suggested by Roel Verhaak on Twitter). Specifically, the prada-frame command takes easy input (gene name, breakpoint location) that I already had and spits out a list of consequences in every transcript that matches.

0
Entering edit mode
6.0 years ago

Have you tried this one?

Oncofuse: Prediction Of Driver Gene Fusions From Ngs Data

Note that it is most straightforward to use with input from some popular fusion detection software tools, i.e. one needs to get the data in a right format.

0
Entering edit mode

Thanks for the suggestion - I've got Oncofuse up and running and the output makes sense. I'm a bit concerned at how it's dropping a large number of fusion candidates, though. I already have these events mapped to Ensembl transcripts and believe them to be valid. Are there options that will allow for retaining these events? Or is there a straightforward way to replace the refseq annotations with ones from Ensembl that may be more inclusive?

0
Entering edit mode

Indeed, it focuses on canonical transcripts from RefSeq, one per RefSeq gene. Extending Oncofuse to isoform level and dealing with junction mapping ambiguity will definitely require a substantial re-write. Adding other genes/transcripts will require additional rounds of annotation and feature selection.

As for your original post, I believe it is not that hard to write a script that tells you if junction combines exons that are in/out of frame. Tables downloaded from UCSC GB for Ensembl genes and transcripts (Gencode V20/Ensembl 76) has exon frames. One has to compute exon remainders which are for 0-based coordinates (end - start + frame) % 3 and check if 5' exon remainder corresponds to 3' exon frame. Of course the hard part is to handle ambiguous cases.

0
Entering edit mode

Thanks for the response, Mikhail. I still have a handful of transcripts for which neither program annotates the frame, so your suggestions about calculating it myself may still come in handy. Thanks for a nice package and the advice!