Dear biostars community,
I'm having some doubts regarding which measure is ideal for reporting expression values and I thought you could help me with your experience.
I've been dealing with RNA-seq data from two projects now (one single end and other project paired-end reads) and I want to choose an expression measurement suitable to compare samples from different experiments.
From literature (I dig a lot into blogs, papers, etc.. ) and essentially I've summed up the following:
- Both RPKM and FPKM measures shouldn't be used anymore since they contain an essentially arbitrary scaling factor which is dependent on the average effective length of the transcripts in the underlying sample. Not reproducible, not comparable...
- TPM measure seems to be more appropriate in dealing with this issue since the sums of normalized reads of each sample are the same across all samples, making it "more suitable" to compare samples. However, its calculation (specifically the denominator term) is also sample dependent and this would be the main reason why I shouldn't use it to directly compare expression values between samples.
- CPM seems to be a less-normalized measure since it takes into account only library size. On the opposite hand, estimated read count don't normalize samples at all, making it useless to my goal (unless I use some between-sample normalization method).
My point is that TPM seems to be the most reliable expression measurement to compare different samples. Still, TMP performs within-sample normalization (although there's a lot of papers comparing samples based on TPM values).
Do you think TPM is suitable to compare between-samples expression values? If not, which method you would recommend? Should I use any between-sample normalization method?
I'm looking forward to hearing your opinion! Thanks in advance!