While going trough the results of KissDE, I noticed a strange repetition of events, that I didn't see before the update of Kiss2refgenome (v2.0.0).
Two differents examples here :
Two IR strictly identical for NUBP2. For both, the genomic position of each splice site (on the lower path) are 1836656 and 1836758. The variable part length is 101 for both.
The only difference come from the genomic blocs size of the upper path : 177 for one, 178 for the other. So unless I am mistaken, I am looking here at the exact same intron retention. But the event has been reported twice by KisSplice, with only 1 base difference in the sequence, not even in the event itself.
EDIT: both set of sequences (bcc_7866|Cycle_2 and bcc_7866|Cycle_13) have a substitution (C>A) at the exact last base, so it is present both in the upper and the lower path. So outside of the intron.
Anothere example, even stranger :
Two IR for TSPAN32. This time there is absolutely no difference. Same bloc size, same splice site, same variable part.
In the end, the only difference I can see, is a little variation in the read coverge. Only one read for only sample in the first example, and one or two read for several samples on the second example. But it is still the same event...
EDIT: both set of sequences (bcc_167629|Cycle_2352655 and bcc_167629|Cycle_2352656) have exactly the same lower path, while there is a substitution in the upper path T>A, so directly in the intron. That explains it I guess.
I might not be clear, so here are the 2 examples with all the data from KissDE : https://docs.google.com/spreadsheets/d/1K9FSZAqcEcu8QLos6yqXAG3BU8LYxX5eI1HWuiDvJBw/edit?usp=sharing
There are several other examples like that, not limited to intron retention, and for several analysis (on completely different samples).
I don't know if the aligner might have something to do with it, but as far as I remember, I have used the same version of STAR, before and after the Kiss2refgenome update.
EDIT : so the culprit was a one base variation. First example, outside of the intron, second example inside the intron. In the end, those events really are duplicates. It is still strange that this type of variation didn't appear before the update. On a 2000 differentially expressed events list, there is something like 150, maybe 200 of thoose "duplicated" events. (With a quick glance, same thing for my other anayses).
EDIT : Maybe I should have mentionned that this analysis was only done with the type_1 file of KisSplice, so it only concern splicing events.
Thanks for your help !