I've sequenced a good number of patient samples as per the best protocol for assessment of splicing and DGE and moving forward as I was advised to do, with using the GTEx data as control. I'm now noticing the gene expression is not analogous between these batches, many genes are not expressed in GTex which are expressed in my internal controls and my patient samples.
With the exception of the stranded option, the sequencing protocols are identical. My thinking is that this could be because the strand information was not retained in the GTEX protocol, but was in mine. Does this sound correct? If it cannot be determined which strand the transcript originated from because loci overlap then some genes will not be counted?
According to this post TruSeq strand-specificity in rsem-calculate-expression I can use the --forward-prob" parameter set to 0.5 for a non-strand-specific protocol. (Default: 0.5). I believe this might alleviate the problem?
With this, RSEM seems to be able to remove strand information from the data making those samples sequenced with stranded protocol comparable to those without stranded protocol.
Can anyone tell me if this is correct?