Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data of tumor samples. It is based on the ultrafast STAR aligner and the post-alignment runtime is typically just ~2 minutes. Hence, fusion detection comes at virtually no cost, since the alignment of FastQ reads is a task that needs to be done anyway in a typical RNA-Seq workflow.
Arriba has been submitted to the DREAM SMC RNA Challenge, an international competition organized by ICGC, TCGA, IBM, and Sage Bionetworks to determine the current gold standard for the detection of gene fusions from RNA-Seq data. As of round 4, Arriba is the best-performing algorithm.
Some more highlights:
- ability to detect intergenic and intronic breakpoints
- ability to detect exon duplications/inversions
- utilization of structural variants obtained from whole-genome sequencing
- filtering of transcript variants observed in healthy tissue
- comprehensive manual available at http://arriba.readthedocs.io/
- simple installation routine; especially, if you already use STAR
We would be glad, if you could give it a try, and are happy to receive feedback! Please visit the homepage to download the code or in case you need help: https://github.com/suhrig/arriba/
hi，dear proferssor： Thank you for providing us with such a good open-source software, which helps us to detect RNA fusion gene. Recently, I encountered a problem of fusion gene, which can be found in the discard file, but through the - k parameter (EML4 ALK) and (- d parameter (2:42504565 2:29447602 +) (- f no_genemic_support) or (- f genemic_support) cannot be recovered. Please help to guide the possible problems.
The fusions in the discard files are listed as follow：
The best way to get help on such specific questions is to open an issue in the GitHub repository or to write a separate post as suggested by ATpoint.
But since we're already here, I'll answer briefly: the event is discarded, because the breakpoints are intronic and there are very few reads. There is one event with breakpoints at splice sites, though. If you use a list of known fusions (parameter -k) and the latest version of Arriba (v1.2.0), the event should be reported.
Thank you very much for your patient guidance and good suggestions. I also tried to run those sample with the latest version of Arriba. I can retrieve one fusion，which is in the sample 209001876FR with breakpoints at splice sites. However, the fusions of the other breakpoint location in the intron with little read support can not be recovered. Do you have any other parameters that can be adviced. These samples are all positive samples verified by DNA data and fish. I need to adjust a parameter to improve the detection sensitivity of the software for ALK-EML4 fusion gene.
filterstells you which filters were responsible for discarding an event. You need to disable those using the parameter
-for tweak their thresholds. Generally, this is not recommended, however, because it will result in a high false positive rate. There simply are too few reads to detect the events reliably. The proper solution is to sequence more deeply. Given your data, maybe the best approach is to dig the events out of the discarded file like you have done.
Thank you very much for your kind help.