Question: Identical splicing events reported twice with KisSplice ?
0
gravatar for david.b.rombaut
3 months ago by
david.b.rombaut0 wrote:

Hi,

While going trough the results of KissDE, I noticed a strange repetition of events, that I didn't see before the update of Kiss2refgenome (v2.0.0).

Two differents examples here :

  1. Two IR strictly identical for NUBP2. For both, the genomic position of each splice site (on the lower path) are 1836656 and 1836758. The variable part length is 101 for both.

    The only difference come from the genomic blocs size of the upper path : 177 for one, 178 for the other. So unless I am mistaken, I am looking here at the exact same intron retention. But the event has been reported twice by KisSplice, with only 1 base difference in the sequence, not even in the event itself.

    EDIT: both set of sequences (bcc_7866|Cycle_2 and bcc_7866|Cycle_13) have a substitution (C>A) at the exact last base, so it is present both in the upper and the lower path. So outside of the intron.

Anothere example, even stranger :

  1. Two IR for TSPAN32. This time there is absolutely no difference. Same bloc size, same splice site, same variable part.

    In the end, the only difference I can see, is a little variation in the read coverge. Only one read for only sample in the first example, and one or two read for several samples on the second example. But it is still the same event...

    EDIT: both set of sequences (bcc_167629|Cycle_2352655 and bcc_167629|Cycle_2352656) have exactly the same lower path, while there is a substitution in the upper path T>A, so directly in the intron. That explains it I guess.

I might not be clear, so here are the 2 examples with all the data from KissDE : https://docs.google.com/spreadsheets/d/1K9FSZAqcEcu8QLos6yqXAG3BU8LYxX5eI1HWuiDvJBw/edit?usp=sharing

There are several other examples like that, not limited to intron retention, and for several analysis (on completely different samples).

I don't know if the aligner might have something to do with it, but as far as I remember, I have used the same version of STAR, before and after the Kiss2refgenome update.

EDIT : so the culprit was a one base variation. First example, outside of the intron, second example inside the intron. In the end, those events really are duplicates. It is still strange that this type of variation didn't appear before the update. On a 2000 differentially expressed events list, there is something like 150, maybe 200 of thoose "duplicated" events. (With a quick glance, same thing for my other anayses).

EDIT : Maybe I should have mentionned that this analysis was only done with the type_1 file of KisSplice, so it only concern splicing events.

Thanks for your help !

kissplice • 202 views
ADD COMMENTlink modified 10 weeks ago by audric.cologne60 • written 3 months ago by david.b.rombaut0
2
gravatar for audric.cologne
10 weeks ago by
audric.cologne60 wrote:

Hello David,

Sorry for the very late answer!

You are absolutely right, this redundancy removal step has not been implemented in the latest version of kissplice2refgenome, that is a mistake. We will add it as soon as possible! Thank you very much for digging into that bug and report it to us!

Regards, Audric Cologne

ADD COMMENTlink written 10 weeks ago by audric.cologne60

Hi,

thanks for the answer ! So to be clear, there is no reality behind thoose events ? Because I have another case in mind where 2 events seems to be exactly the same, except for the junction. There was a one base difference exactly on the junction site. Shifting it from a canonical, to a none canonical site.

Thanks !

ADD REPLYlink written 10 weeks ago by david.b.rombaut0

Hi David,

Short answer is, these events are real as they are supported by reads, but most of the time we should merge them together.

The redundancy problem comes from a particular and key structure of the deBruijn Graph : the bubble. KisSplice is optimised to find such structure because each splicing event will create a bubble in the deBruijn Graph. BUT, not all bubble describe a splicing event. SNV, InDel, inexact repeats , among other, also creat a bubble in a deBruijn Graph. Now, let's say that we have an Intron Retention event (1 bubble), but the retained intron exist in two forms : with or without e deletion. This will create a bubble inside the previous bubble. As a result, KisSplice will output ALL POSSIBLE BUBBLES : spliced intron + retained intron without deletion AND spliced intron + retained intron with deletion. And we have a "duplicated" event. The point is, if one is interested in splicing event, this deletion does not carry useful information.

The main issue is that redundant bubbles will create problem during the quantification step as reads will be multimapped between redundant bubbles (except for the reads with or without the indel in our example, which are the only one to decipher between the two bubbles), and we will end up loosing statistical power by splitting our reads among redundant bubbles.

We are currently working on KisSplice to integrate the redundancy removal, among other major performance and accuracy ameliorations during the quantification step. So, in the near future, KisSplice (and not KisSplice2RefGenome) will merge the redundant bubbles.

I hope this was clear enough... Do not hesitate to ask us any questions, we'll be glad to answer :)

Have a nice day!

ADD REPLYlink written 10 weeks ago by audric.cologne60

Hi,

It was very clear. It is good to see those new developments for KisSplice !

Thanks for your help and have a nice day !

ADD REPLYlink written 7 weeks ago by david.b.rombaut0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1024 users visited in the last hour
_