I am interested in the differential expression of splice junctions.
For this purpose I have a count matrix containing counts (per sample) from reads spanning each splice junction.
I want to perform a DE analysis on the splice junction level and I have not found any suitable tools for this task.
but I am reluctant to use it, because DEXSeq is too computationally expensive for the number of samples I have.
I have already used a voom-limma pipeline for DE analysis of genes & transcripts, as well as a voom-diffSplice pipeline for DE analysis of exons.
Now I am curious if a voom pipeline would be suitable for DE of splice junctions (I remember having read somewhere, that limma is suitable for any genomic feature, but since a splice junction consists of 2 borders and does not have actual length, I am wondering if it would be a valid analysis.)
If it is, then I'm unsure if voom-limma or voom-diffSplice is more appropriate for this purpose
I would appreciate any help or caveats regarding such an analysis with limma, or any tip regarding the use of other tools to perform such an analysis.
Please explain more carefully your experiment and the questions you want to put to your data.
I ask because I expect you are really interested in the "expression of splice junctions" as compared to something else (say, roughly, "expression of exons". )
I guess this because you are considering DEXSeq as being able to answer your questions, and DEXSeq can not use the data you have since it quantifies "differential exon usage" and nowhere in its analysis does it use data such as you have (what might be called "junction read counts").
Its manual does state that " a change in relative exon usage is typically due to a change in the rate with which this exon is spliced into transcripts (alternative splicing)" but the relationship is assumed and neither tested nor modelled.
Here are some questions to answer that might allow folk in this forum to help you better. Don't worry about the algorithm or the test or how "computationally expensive" anything is now. Focus on the science and the data:
Why do you say "DEXSeq is too computationally expensive for the number of samples I have."?
How many conditions do you have, and what are they?
How many biological replicates of each condition?
Any technical replicates?
Do you really only have junction read counts to go by? What happened to the upstream results (alignments, fastqs, etc)?
And most importantly, what are you really hoping to ask of you data?