Apologies in advance if this question is very basic =)
I have run
Kallisto using the full human transcriptome dataset
GRCh37-75. This dataset includes both coding and non-coding transcripts, obviously. I would like to further analyse the coding ones. Since the
TPM measurement take into account the total number of transcripts in the dataset, do I need to re-scale/re-calculate the TPM values for each coding transcript using only the coding subset? Is there a difference between this approach or running
Kallisto only on those coding transcripts?
I hope the two questions are clear enough,
Thank you in advance.
Not an expert but my guess is no you would not need to rescale just because you are looking at a subset. What if later you want to compare against non-coding...then you'd have to do it all again.
Mmmm.. that's true, but I'm not interested in the non-coding transcripts at all. That's why I have posted a second question "Is there a difference between this approach or running Kallisto only on those coding transcripts?", because if you use only coding set of transcripts when running kallisto, the denominator in the TPM calculation would vary.