I am working on a project comparing RNAseq quantification results between Illumina short-reads and Nanopore long-reads and I have a couple questions about comparing the quantification results from these two technologies. More specifically I need some help with figuring out how to normalize the data for the comparisons within samples and between samples. So far I have come up with the following plan:
Using CPM to compare gene/transcript expression within each sample sequenced with nanopore. For example, comparing if gene.X transcripts are more abundant than gene.Y transcripts within sample_1 sequenced with nanopore. Using CPM instead of TPM for nanopore seems like a good option since our nanopore runs do not have transcript length bias. Does this sound like a good strategy?
Using TPM to compare gene/transcript expression within each sample sequenced with illumina. For example, comparing if gene.X transcripts are more abundant than gene.Y transcripts within sample_1 sequenced with illumina. Using TPM instead of CPM for illumina seems like a good option since illumina has transcript length bias (a single long transcript will have more counts that a single short transcript). Does this sound like a good strategy?
Here is where I am having trouble coming up with a good normalization strategy. Comparing gene/transcript expression between the same sample sequenced with illumina and nanopore. e.g., performing a spearman correlation between gene expression in sample_1 sequenced with illumina and sample_1 sequenced with nanopore. I am not sure what would work here since Illumina has transcript length bias and nanopore does not. Do you have any suggestions?
Any help here will be greatly appreciated.
Best, Bernardo
Hi, I was wondering about something similar. I have both Illumina (short) and nanopore (long) reads. What sort of normalisation did you end up using? I have bambu output for nanopore and salmon for Illumina. I think each has some sort of normalisation but what about within each sample and between samples? I have trouble coming up with something that makes sense for my analysis other than correcting for library size?
Any input is greatly appreciate!
The author of Salmon just preprinted oarfish, a formalized adaptation of Salmon to Nanopore.
https://www.biorxiv.org/content/10.1101/2024.02.28.582591v1
It should make comparison easier since they both follow similar principles for transcript abundance estimation.