Subset of transcripts - Do I need to re-scale the TPM values?
1
1
Entering edit mode
5.8 years ago
user230613 ▴ 350

Hi!

Apologies in advance if this question is very basic =)

I have run Kallisto using the full human transcriptome dataset GRCh37-75. This dataset includes both coding and non-coding transcripts, obviously. I would like to further analyse the coding ones. Since the TPM measurement take into account the total number of transcripts in the dataset, do I need to re-scale/re-calculate the TPM values for each coding transcript using only the coding subset? Is there a difference between this approach or running Kallisto only on those coding transcripts?

I hope the two questions are clear enough,

RNA-Seq tpm kallisto • 1.8k views
1
Entering edit mode

Not an expert but my guess is no you would not need to rescale just because you are looking at a subset. What if later you want to compare against non-coding...then you'd have to do it all again.

0
Entering edit mode

Mmmm.. that's true, but I'm not interested in the non-coding transcripts at all. That's why I have posted a second question "Is there a difference between this approach or running Kallisto only on those coding transcripts?", because if you use only coding set of transcripts when running kallisto, the denominator in the TPM calculation would vary.

1
Entering edit mode
5.8 years ago
BioinfGuru ★ 1.6k

The point of RPKM/FPKM/TPM is normalisation i.e. to make different regions of the genome comparable. Whether you include/exclude any region (e.g. non-coding transcripts) in the normalisation calculation is not important if you are not going to use them at all anyway (many studies exclude other regions like blacklisted regions, mitochondrial regions etc.) By excluding regions, the raw TPM values simply will be a greater proportion of the whole, but the relationship between those values (i.e. fold change) will remain the same.