I am working on a project comparing RNAseq quantification results between Illumina short-reads and Nanopore long-reads and I have a couple questions about comparing the quantification results from these two technologies. More specifically I need some help with figuring out how to normalize the data for the comparisons within samples and between samples. So far I have come up with the following plan:
Using CPM to compare gene/transcript expression within each sample sequenced with nanopore. For example, comparing if gene.X transcripts are more abundant than gene.Y transcripts within sample_1 sequenced with nanopore. Using CPM instead of TPM for nanopore seems like a good option since our nanopore runs do not have transcript length bias. Does this sound like a good strategy?
Using TPM to compare gene/transcript expression within each sample sequenced with illumina. For example, comparing if gene.X transcripts are more abundant than gene.Y transcripts within sample_1 sequenced with illumina. Using TPM instead of CPM for illumina seems like a good option since illumina has transcript length bias (a single long transcript will have more counts that a single short transcript). Does this sound like a good strategy?
Here is where I am having trouble coming up with a good normalization strategy. Comparing gene/transcript expression between the same sample sequenced with illumina and nanopore. e.g., performing a spearman correlation between gene expression in sample_1 sequenced with illumina and sample_1 sequenced with nanopore. I am not sure what would work here since Illumina has transcript length bias and nanopore does not. Do you have any suggestions?
Any help here will be greatly appreciated.
Best, Bernardo