Hi !
I'm working on a,RNA-Seq data analysis, whose experimental plan is as follows:
sample phenotype traitement
3 X Col control normal
3 X Col_HS control heat-shock
3 X mutant mutant normal
3 X mutant_HS mutant heat-shock
The main objectif is to discover the effect of the mutation on the transcription and the analyse control vs heat-shock serves me as a control for the quality of the analyzes because the effects of heat-stress have been well studied. However I don't know which approach used to validate my analysis. I retrieved a list of 12 genes experimentally validated as differentially expressed in heat-shock, and I looked if I found them in my experience.
I find 9 of 12 but as I have more than 1000 differentially expressed genes I don't know if it's good or not.
My question is, how do you judge the pertinence of an analysis? I try different tools and I would like to make a comparison of these tools to choose the best. How can I do ?
Thanks
Shouldn't the validation be done by an independent experimental method? You are only generating hypothesis to test by data analysis.
Thank's for all the answers , in fact I test different methods for detect alternative spliced genes between two conditions (DEXeq, rMATS, RATS ..) and saw that there were several tools that made it possible to generate simulated reads (such as polyester) from parameters specific to an rna-seq dataset (like sequencing coverage, reads length, etc.) I wanted to test it to then calculate the true positives rate , the false positives rate, etc. This would allow me to have validation on a simulated dataset in addition to experimentally validated genes that I retrieve in my list of genes Do you think it's a good approach?
Did you search the literature? For example:
Comparison of RNA-seq and microarray platforms for splice event detection using a cross-platform algorithm
Comparison of Alternative Splicing Junction Detection Tools Using RNA-Seq Data
edit: by the way, I agree with genomax and believe proper validation is confirming these results with different methods on (preferentially) a different dataset. You can probably decrease the number of false positives with the intersection of the results of different tools over one dataset, but this is not validation.
Good point - I didn't previously notice the "splicing" tag, so I revised by answer about gene expression programs.
I also personally like QoRTS+JunctionSeq (and sometimes SGSeq for specific genes). While JunctionSeq has a gene-level metric (and overall gene plots), I think it is mostly comparing exons and junctions (with different dispersions). So, describing "exon" or "junction" counts (rather than "gene" counts) may help get answers more specifically related to splicing analysis.
However, I believe those are a little harder to benchmark, and I think the splicing analysis may require extra work to assess your results (and I think there is extra variability, such that having more replicates may be relatively more important). Nevertheless, sometimes even Sashimi plots in IGV can be useful for a gene of interest, even without any differential exon / splicing analysis.
I did not use junctionseq, I'll go see that thank's
Some experiments will be do to confirm some genes of interests (as said h.mon and genomax). I have already use IGV to visualise some AS genes but if I wanted to benchmark the tools is that in parallel of the analyzes I must write a report for my master's degree, explaining my strategies and why I have use some parameters, threshold etc
So my first idea was to use artificial generate reads (with more replicates for exemple) and so to know the differentially spliced genes in relation to this dataset, and then to use the tools by varying the parameters to maximize the true positive rates
But if you tell me that alternative splicing studies are difficult to benchmark, I will maybe not waste my time aha
Thank's for all the advices anyway
In a sense, the "benchmark" is trying to find the best way to represent your data (and validate your own results). So, in that sense, I think it is important.
However, notes in your paper's methods should probably be OK (compared to spending time for a separate benchmark paper, which may or may not represent the best thing for somebody else to do).
Best of luck!