Hello, I'm currently analyzing rna-seq data to detect fusion transcripts. For this I'm trying to use all the tools available and test them to compare the performance and the results. The fact is, my data aren't perfect for fusion detection : single-end ~ 70bp, I have not sequenced this but I'm obliged to work with it.
So, I've tested 3 tools at the moment :
- Arriba (https://github.com/suhrig/arriba)
- Fusioncatcher (https://github.com/ndaniel/fusioncatcher)
- Star-Fusion (https://github.com/STAR-Fusion/STAR-Fusion/wiki)
I'm looking to test other tools (tophat fusion, gfusion, fusionmap). But the results are already very different. Here is a venn diagram to get a better vision of my results (entire dataset ~ 50 samples ) : http://zupimages.net/viewer.php?id=19/23/vecs.png
The results for each sequence are very different, few transcripts are kept between the tools but not on all dataset, for me, it's not sufficient to interprate it correctly.
I've tested the 3 tools on a control sequence which contains 17 known fusion transcripts (i got it on github of fusioncatcher, it was a paired-end, i concatenate the two reads into one to simulate a single-end. The results between the 3 tools are quite similar : http://zupimages.net/viewer.php?id=19/23/m0vg.png
Maybe the results are biaised for this control sequence because it have been created by hand and like it contains known fusion transcripts, the tools are more accurate I think.
I would like to get your point of view and advices for my situation, if you have already experienced fusion detection what would you do ? At the moment I want to get the more results as possible and keep only fusion transcripts which are found in two tools or more.
Thank you in advance.