I have been tasked with validating aberrantly expressed genes detected in Lexogen rna-sequencing data (single end, moderate sequencing depth) (and ran through my own pipeline) using TruSeq (paired end, high sequencing depth) sequencing data ran through the GTEx (Genotype-Tissue Expression) rna-seq pipeline. Samples ran through Lexogen AND TruSeq are biological replicates.
EDIT: The pipelines both include deduplication, quality control, alignment and gene quantification. For detecting aberrant expressed genes, gene specific thresholds are established. For example, if a value of gene expression is over said threshold, it is aberrantly overexpressed. The same approach applies to underexpression.
The logic behind this I believe is that if genes that are aberrantly detected from single-end, moderate sequencing depth are also in the paired end, high sequencing depth run and ran through a pipeline used by the Broad Institute, we can confirm that our pipeline is adequate in detecting aberrantly expressed genes?
I know that normally one would use qPCR or ddPCR to confirm expression levels but I don't have that option at the moment--unfortunately.
What do you all think of this approach? Or if you have validated data in this way, what were your experience? I would appreciate any comments, critiques of my assumptions or approach. I am fairly new to the field and am excited to learn from you all!
EDIT: Corrected my use of NextSeq as a library prep, added some more details on the analysis pipeline. Hope that helps!