I have a conceptual question that I was hoping someone could answer.
Can I say that microRNA A is expressed x-fold greater than microRNA B directly from the TCGA miRseq data? Can I do this after normalizing the data? Does it matter if I use RSEM or RPKM values. It seems to me that it should be legitimate in any case since microRNAs are approximately the same length, but maybe I am overlooking something.
For example, I am following a paper published in Nature Communications entitled "Identification of a pan-cancer oncogenic microRNA superfamily anchored by a central core seed motif". The authors download the data and collapse isoform reads to a single read count using the reads. They say they used the reads per million microRNAs mapped, which establishes each microRNA read count as a fraction of the total microRNA population. The authors then do upper quartile normalization which they say is important because a subset of microRNAs (miR-143 in particular) contributes so significantly to the total read count. In the text, the authors appear to use the resulting values to do a direct comparison between microRNAs.
I definitely want the collapsed isoforms, and I think it makes sense to do the normalization. However, I would like to say that a particular microRNA is expressed x-fold higher than another. Can I do this from the collapsed and normalized data?
If this has already been answered, I apologize. I could not find it. Thanks.
After alignment the miRNASeq read sequences to target sequences (miRNA database), you can calculate the expression of each miRNA under two conditions (fold change). It is usually better to normalize your data to represent fold change. There are several normalization methods and among them RPKM and/or FPKM is a popular method.