Hello!
First, thanks for the feedback. I'm the author of Oncofuse. I would like to note, that the chimeric junctions in RNAstar are selected based on the following criteria:
"the segments belong to different chromosomes, or different strands, or are far from each other"
Those junctions that didn't make it to "chimeric" category while on the same chromosome should be quite close. Oncofuse filters all junctions in which reads belong to the same gene, as those are splicing events, while the tool is solely focused on gene fusions. Moreover tools like Tophat-fusion report lots of such junctions.
If the reads come from genes that are close to each other, then there is a possibility that the transcript is of readthrough nature. Such transcripts often occur in normal tissues, so a priori the likelihood of them being oncogenic is less (yet there are many counter-examples).
Anyways, I believe the chimeric junction file also reports fusions on the same chromosome (see section 5.2 in https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf).
On the other hand, I agree that there appears to be no clear definition of a parameter that sets the minimal distance between read parts to be considered a chimera or not. The most likely option for this is --outSJfilterIntronMaxVsReadN
option.
If there is really a chance that STAR misses important chimeric transcripts in Chimeric junctions file, then I'll consider implementing a parser for it.
EDIT
According to the reply here, --alignIntronMax
is the parameter that controls which junctions get filtered to "Chimeric" output files. Other options determine which junctions make it to standard output, SJ.out.tab
.
Best regards,
Mike
Hi Mike,
I was hoping you would react. Thanks btw for the very nice tool!
I think I understand the differences between the junction files and my understanding is that if mapped reads are on the same strand and on the same chromosome they get reported in the Splice Junction out, of course taking the
--outSJfilterIntronMaxVsReadN
in consideration which you definitely need to set high (~10mb, or even 100mb?) since otherwise junctions very far apart will not end up in the chimeric junction file and not in splice junction.We actually had an example of a known fusion gene that was close on the same chromosome and linear (same strand) and was not reported in either of the files because of setting the
--outSJfilterIntronMaxVsReadN
to small. I still think it is strange because you could potentially miss junctions this way although I dont know if this is still the case in recent versions.I feel that this could also be something the author of rna-star could think about so I will see if I can contact him as well. (maybe I'm missing something)
Thanks a lot for your reply.
Best,
Mark
Ok now I see, it would be wise to ask the author then. I've just created an issue at the STAR repository.
Please see the edit above, this clarifies a lot