Hello everybody, I am fairly new to the RNA-seq workflow and I am currently struggling how to evaluate the performance of the RNA-seq pipeline I am trying to establish, which will be used to investigate differential gene expression.
Lets say I have 3 different pipelines:
1. kallisto -> tximport -> DESeq2 -> 160 differentially expressed genes (random number) 2. salmon -> tximport -> DESeq2 -> 173 differentially expresssed genes 3. STAR -> featurecounts -> DESeq2 -> 184 differentially expresssed genes
The problem is, that we do not know the "ground truth", i.e. which genes really are differentially expressed. How do I know which pipeline is performing the best? Are there any parameters i have to look out for? Furthermore, there are plenty of options within DESeq2 which influence the number of genes that are considered differentially expressed, e.g. the method used for the Log fold shrinkage (apeglm vs ashr) or the filter function (IHW vs. default).
How do I determine which options to choose?
Thank you very much, any help is appreciated!