define alternative splice sites
2
0
Entering edit mode
5.3 years ago

Given junction.bed files from tophat how can is define exon splicing events? for example skipped exon, constitutive exon or skipped junctions.

alternative splicing • 1.4k views
0
Entering edit mode
5.3 years ago

That's what rMATS is for, though it'll take the BAM file instead of the junction.bed file, which in my opinion is essentially worthless.

Also, stop using tophat. Use something better, like STAR or even hisat2.

0
Entering edit mode

So how to use BAM files to look for splicing events? could you explain the algorithm of splicing events?

0
Entering edit mode

0
Entering edit mode

Thanks @Devon Ryan i am aware of rmats, (picking up midway some ones work) I need to annotate junctions that were differentially expressed by using DEXseq. I only have set of junctions now, no access to fastq or bams.

0
Entering edit mode

Hmm, I'm sure there's something for this but it's not something I've ever needed to do. For the most part, what you're seeing is just changes in isoform usage.

0
Entering edit mode

Yes, I guess I need to write something myself.

0
Entering edit mode
5.3 years ago
Malcolm.Cook ★ 1.3k

Hi NBS:

If you provide

• an exhaustive list of the varieties of "exon splicing events" you wish to define
• descriptions of what you mean by each
• an example or two of each given as a simulated junction.bed file

... then perhaps someone will be able or inclined to take your challenge.

In my experience, these terms are not consistently defined in the literature so it would be a mistake to try and assume what you really want.

For example, I've never heard of "skipped junctions" as a kind of "exon splicing event"

Similarly, "constitutive exon" is a label given to an exon which appears in every (known) isoform of a gene. But it is not a name for an "exon splicing event".

So you really have to be quite specific in what you are asking for.

That said, I expect that however you frame the question, you will find that knowing just the locations of (putative) introns, as provided by a junction.bed file, will prove insufficient to answer it. This is because these files don't tell you where the surrounding exons begin and end. They just tell you where the introns are.

Nonetheless: you might think along these lines:

1. Consider your junctions.bed file(s) as a directed graph(s) with each line in the file representing an 'edge' connecting a 'donor' with an 'acceptor' site (where the sites are integers being the chromosomal coordinate).

2. Then split it up into a set of its 'connected components'.

3. Then relabel each connected components, changing the label from the chromosomal coordinate to its rank in the list of all the chromosomal coordinates.

Then each unique graph might correspond to a "kind" of exon splicing event.

Example

(ignoring strand and chromosome for simplicity) given these junctions as input directed graph:

1100 1200
1100 1300
2100 2200
2100 2300
3100 3200
3100 3300
3100 3400


The connection components would be

1100 1200
1100 1300

2100 2200
2100 2300

3100 3400
3200 3400


which would be relabeled as

1 2
1 3

1 2
1 3

1 3
2 3


Now, you might decide that [[1,2],[1,3]] is the canonical motif for an alternative acceptor event (of which we have 2), and [[1,3],[2,3]] is the canonical motif for an alternative donor event (of which we have 1).

BUT, remember, you don't know where the surrounding exons end, so, you might well be making a mistake in-so-doing.

If you know and represent the extent of the surrounding exons (as might be inferred from RNA-Seq coverage, or might be given as known in a GTF file), this kind of approach extends nicely. A little trickier though but doable.

FWIW: I still wonder if these categories are really biologically meaningful. Many different schemes have been devised to classify them (a good review appears in: A General Definition and Nomenclature for Alternative Splicing Events) but less interesting work has substantiated that these classes are biologically relevant, my prior efforts notwithstanding. I would appreciate being educated contrariwise here... For instance: Do we know that different RBPs control switching between A3SS (Altenative 3' Splice Sites) than control switching between, say, MXE (Mutually Exclusive Exon). That would be interesting!

0
Entering edit mode

@Malcom.cook thank you for elaborate information, I lost you hen you started talking about directed graphs. Will be reading up on this to make complete sense. From what you have explained in the example let me assume this (exon_skipped column was derived from using https://regtools.readthedocs.io/en/latest/commands/junctions-annotate/) passing bam files.

start   end exon_skipped
1100    1200    0
1100    1300    1


using dexseq it can be seen that reads mapping to 1100 1300 is statistically significant between two conditions, which means that the previous exon (1100 1200) was skipped

------A1100(5')------1200(3')B---------1300(3')C--------

correct me if i am wrong if we can consider B as exon skipped.

0
Entering edit mode

Hi - Sorry, I can not advise you on your use of dexseq for this purpose or interpretation of its output.

Cheers, Malcolm

0
Entering edit mode

0
Entering edit mode

Actually I don't need the link, as I wrote, "I can not advise you on your use of dexseq for this purpose or interpretation of its output"