I have a question regarding an RNA-Seq experiment including ~250 leukaemic samples. I am interested in detecting differential isoform expression between mutated patients and the control patients (consisting of all other patients) for several mutations.
I am using edgeR + sva + limma + voom for this analysis.
In order to increase statistical power I wish to filter out isoforms not expressed in the tissue of interest as is done in a typical DE analysis. The problem lies in the fact that one mutation is found in only 3 samples and I wish to keep isoforms that are highly expressed in those 3 mutated samples and not expressed in the control and vise versa.
The approach I have come up with until now is:
Apply a general filter prior to the analysis and remove all isoform that don't have at least 1 CPM in at least 3 samples:
dge <- DGEList(counts=cm); cps <- cpm(dge); k <- rowSums(cps>=1)>3; dge <- dge[k, keep.lib.sizes = F]
Perform the DE analysis
Filter out the differentially expressed isoforms that have very low expression in both groups of a contrast. For example if I am comparing mutation "A" against the control and I get DE isoforms that have low expression in both the samples with mutation "A" and in the control samples (maybe <1 CPM average in each group?), I would remove them, because they are most likely false positives or biologically not relevant.
Is this a meaningful approach? Or can we assume that if an isoform has low expression in both groups it will not be significantly differentially expressed in the first place?
This problem is magnified when looking at smaller loci (exons, introns) using the diffSplice function of limma, where we have a large number of loci with very low expression, some of which may be relevantly expressed in the mutated samples.
Any feedback on this or a different suggestion is greatly appreciated!