Question

Tool:Arriba: Fast and accurate gene fusion detection from RNA-Seq data

10

Entering edit mode

6.2 years ago

uhrigs ▴ 150

Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data of tumor samples. It is based on the ultrafast STAR aligner and the post-alignment runtime is typically just ~2 minutes. Hence, fusion detection comes at virtually no cost, since the alignment of FastQ reads is a task that needs to be done anyway in a typical RNA-Seq workflow.

Arriba has been submitted to the DREAM SMC RNA Challenge, an international competition organized by ICGC, TCGA, IBM, and Sage Bionetworks to determine the current gold standard for the detection of gene fusions from RNA-Seq data. As of round 4, Arriba is the best-performing algorithm.

Some more highlights:

ability to detect intergenic and intronic breakpoints
ability to detect exon duplications/inversions
utilization of structural variants obtained from whole-genome sequencing
filtering of transcript variants observed in healthy tissue
comprehensive manual available at http://arriba.readthedocs.io/
simple installation routine; especially, if you already use STAR

We would be glad, if you could give it a try, and are happy to receive feedback!

Please visit the homepage to download the code or in case you need help: https://github.com/suhrig/arriba/

variant-calling RNA-Seq cancer gene-fusion • 8.2k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 6.2 years ago by uhrigs ▴ 150

0

Entering edit mode

hi，dear proferssor： Thank you for providing us with such a good open-source software, which helps us to detect RNA fusion gene. Recently, I encountered a problem of fusion gene, which can be found in the discard file, but through the - k parameter (EML4 ALK) and (- d parameter (2:42504565 2:29447602 +) (- f no_genemic_support) or (- f genemic_support) cannot be recovered. Please help to guide the possible problems.

The fusions in the discard files are listed as follow：

#sample gene1   gene2   strand1(gene/fusion)    strand2(gene/fusion)    breakpoint1     breakpoint2     site1   site2   type    direction1      direction2      split_reads1    split_reads2    discordant_mates        coverage1       coverage2       confidence      closest_genomic_breakpoint1     closest_genomic_breakpoint2     filters fusion_transcript       reading_frame   peptide_sequence        read_identifiers
209001871FR     EML4    ALK     +/+     -/-     2:42492418      2:29447274      intron  intron  inversion       downstream      downstream      1       0       0       39
      0       low     .       .       duplicates(21),min_support      .       .       .       .
209001872FR     EML4    ALK     +/+     -/-     2:42504565      2:29447602      intron  intron  inversion       downstream      downstream      0       0       0       4
       36      low     .       .       duplicates(4),low_entropy(1)    .       .       .       .
209001873FR     EML4    ALK     +/+     -/-     2:42527926      2:29448177      intron  intron  inversion       downstream      downstream      1       0       0       70
      32      low     .       .       duplicates(31),min_support      .       .       .       .
209001875FR     ALK     EML4    -/+     +/+     2:29672744      2:42413090      intron  intron  deletion/3'-3'  downstream      upstream        0       0       0       0
       29      low     .       .       duplicates(1),mismatches(1)     .       .       .       .
209001876FR     EML4    ALK     +/+     -/-     2:42552694      2:29446394      splice-site     splice-site     inversion       downstream      downstream      1       0
       0       38      0       low     .       .       duplicates(16),min_support      .       .       .       .

ADD REPLY • link updated 4.0 years ago by ATpoint 82k • written 4.0 years ago by 421375212 • 0

1

Entering edit mode

The best way to get help on such specific questions is to open an issue in the GitHub repository or to write a separate post as suggested by ATpoint.

But since we're already here, I'll answer briefly: the event is discarded, because the breakpoints are intronic and there are very few reads. There is one event with breakpoints at splice sites, though. If you use a list of known fusions (parameter -k) and the latest version of Arriba (v1.2.0), the event should be reported.

ADD REPLY • link 4.0 years ago by uhrigs ▴ 150

0

Entering edit mode

Thank you very much for your patient guidance and good suggestions. I also tried to run those sample with the latest version of Arriba. I can retrieve one fusion，which is in the sample 209001876FR with breakpoints at splice sites. However, the fusions of the other breakpoint location in the intron with little read support can not be recovered. Do you have any other parameters that can be adviced. These samples are all positive samples verified by DNA data and fish. I need to adjust a parameter to improve the detection sensitivity of the software for ALK-EML4 fusion gene.

ADD REPLY • link 4.0 years ago by 421375212 • 0

0

Entering edit mode

The column filters tells you which filters were responsible for discarding an event. You need to disable those using the parameter -f or tweak their thresholds. Generally, this is not recommended, however, because it will result in a high false positive rate. There simply are too few reads to detect the events reliably. The proper solution is to sequence more deeply. Given your data, maybe the best approach is to dig the events out of the discarded file like you have done.

ADD REPLY • link 4.0 years ago by uhrigs ▴ 150

0

Entering edit mode

Thank you very much for your kind help.

ADD REPLY • link 4.0 years ago by 421375212 • 0

score 2 · Answer 1 · 2019-04-18

We are happy to announce that Arriba won first place in the DREAM SMC-RNA Challenge! The final results can be viewed here (requires a free Synapse account): https://www.synapse.org/#!Synapse:syn2813589/wiki/588511 As a result, Arriba will be presented at the DREAM Challenge satellite workshop of the RECOMB conference in Washington, D.C. beginning of next month.

In addition, since our first announcement on this forum a year ago, many improvements have been made to Arriba:

streamlined workflow, which makes Arriba even faster and easier to implement
installation via Docker, Singularity, and Bioconda
automatic generation of publication-quality figures
prediction of peptide sequences and retained protein domains
CRAM support

score 2 · Answer 2 · 2020-11-12

Version 2 of our gene fusion detection algorithm Arriba is available. It comes with a number of new features and enhancements:

detect viral integration sites
detect fusions supported by multi-mapping reads (e.g., CIC-DUX4, NPM1-ALK)
detect internal tandem duplications (e.g., FLT3, BCOR, ERBB2)
support for mouse (mm10)
more comprehensive annotation
speed improvements
accuracy enhancements

As usual, the code is available on GitHub: https://github.com/suhrig/arriba/releases

Documentation and installation instructions are available on ReadTheDocs: https://arriba.readthedocs.io/en/latest/quickstart/

score 0 · Answer 3 · 2021-03-20

We are proud to announce that our manuscript about Arriba has been published in this month's issue of the Genome Research journal. From now on, please cite the following article if you use Arriba for published research:

Sebastian Uhrig, Julia Ellermann, Tatjana Walther, Pauline Burkhardt, Martina Fröhlich, Barbara Hutter, Umut H. Toprak, Olaf Neumann, Albrecht Stenzinger, Claudia Scholl, Stefan Fröhling and Benedikt Brors: Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Research. March 2021 31: 448-460; Published in Advance January 13, 2021. doi: 10.1101/gr.257246.119

score 0 · Answer 4 · 2022-01-19

After almost a year of further development of enhancements, new features, and bug fixes, the next version of our gene fusion detection tool Arriba is finally out (version 2.2.0). The code and user manual are available on Github: https://github.com/suhrig/arriba/

The most notable enhancements are:

improved detection of viruses and viral integration sites
improved detection of internal tandem duplications
support for mm39/GRCm39
utility scripts which facilitate common tasks related to fusion detection
polishing of fusion visualizations

More details can be found in the release notes: https://github.com/suhrig/arriba/releases