Hello everyone!
Last week I have been trying to compare RNA-seq isoform-level quantification to Nanostring data in order to assess the reproducibility between both platforms. I have two samples with both types of data available and a gene signature in which I'm specially interested. The tools I have been using are STAR for mapping and RSEM for quantification, with hg38 as reference genome and hg38.ensGene.gtf (downloaded from UCSC site).
The pipeline runs without problems but the results do not match my expectations at all. For some of this genes, I have observed that the quantification of "expected_counts" and "TPM" is 0, even though when I open the bam files in IGV I can see reads mapping to these isoforms. An example is VEGFA, for which I link a screenshot.
Reads are clearly mapping to VEGFA, yet some of the isoforms have 0 as their TPM (the one marked in grey for example, which is the largest). This results reproduce when using MapSplice as alignment algorithm instead of STAR.
Why is this happening? Why is RSEM assigning reads to some isoforms and not others when they are so similar? Please help.
Yours truly, Arturo
As a small aside: is there any reason to prefer RSEM over Salmon/Kallisto?
I seem to remember that the latest paper from Rob Patro's lab showed that most of the differences between STAR/RSEM and Salmon were down to the STAR part, rather than the RSEM part, and that the latest version of salmon, which includes decoy sequences, is more similar than earlier verisons.
Thank you for your clear and concise answer. It was very helpful!