I have to disagree with kristoffer.vittingseerup on that matter. TPM, like all methods that are entirely based on per-million scaling without further corrections suffer when it comes to correction for changes in library composition. This is an issue in "normal" RNA-seq when simply comparing the same cell type of the same species between conditions (like a certain treatment). It is most likely even more an issue when you compare species as between species you might have gains or losses of genes, notable expression differences, changes in gene length etc, so I do expect a notable change in composition. TPM does not account for this. It is good to compare transcript composition within a single sample. The thing is that the common normalization techniques like
DESeq2 or transformations such as
rlog all assume that most genes do not change, which I at least find questionable between species, both towards the biological reality and the completeness of the reference transcriptomes which might further impose technical difficulties.
I am surprised that these naive per-million methods are still in use and recommended even by more experienced folks as there is plenty of literature out these that recommends against it. Some brief examples (there are much more benchmarking papers on this out there):
As shown in Table 2, in many comparisons, Total Count and RPKM/FPKM perform worse than all other methods, and several authors expressly recommend against its use .
(...) TC and RPKM do not improve over the raw counts
If you search Biostars and the web for opinions on RPKM/FPKM/TPM for meaningful differential between-sample analysis you see that the statistics community strongly argues against it. The Bioconductor support page is full of threads towards this and the package maintainers of the established tools typically recommend against it.
As for your question:
=> I would definitely survey the literature for dedicated approaches that tackle all these points,
e.g. SCBN (https://bioconductor.org/packages/release/bioc/html/SCBN.html) and not try and naive/ad-hoc methods as these might give you skewed results.
modified 9 months ago
9 months ago by
ATpoint ♦ 34k