I'm looking to benchmark some known splicing algorithms against each other (e.g., MISO, MATS) to examine how they handle datasets relative to one another with regards to performance, efficiency, accuracy, etc on biological data. For example, does splicing algorithm #1 pick up a known splicing variant whereas splicing algorithm #2 totally misses the mark? For example, does splicing algorithm #1 get the job done in three hours less time than splicing algorithm #2? As such, I'd like to find and use some "good" previously published alternative splicing (RNA-seq) datasets. Surprisingly, not many such datasets exist (and trust me, I've looked) so I wanted to ask the community for your thoughts... let me define what I mean by "good":
- Multiple samples (obviously)
- Clear splicing events (by clear, I mean very evident as per FPKM values or some other measure)
- I do not have a preference for how these datasets were generated (i.e., what algorithm was used in the original paper)
- They need to be biologically validated (i.e., just a good FPKM measure doesn't cut it)
- Publicly available (obvious, until you realize how frequently fastq files that are supposed to be available in SRA were apparently "not requested by the reviewers" upon follow-up with the corresponding author of the respective paper)
Can anyone help me out here?