Question

Counts vs TPM - Comparing gene expression within a sample

0

Entering edit mode

2.2 years ago

jac • 0

Hello,

I'm hoping to get some clarification regarding normalized counts and other methods of abundance.

I've used DESeq2 as the final step for my RNA-Seq analysis and was planning on using the normalized counts data as a proxy for the number of transcripts. Upon doing a bit of googling, this resource (https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html) amongst other blog posts/questions have said TPM is the most appropriate way to assess expression levels between genes within a sample.

My understanding is that the reason for this is because DESeq2 normalized counts aren't calculated based on sequencing depth and gene length.

Therefore, I was hoping to get some confirmation on:

1) I would like to be able to conclude things like "kinases make up XX% of the transcriptome in my sample of tissue Y at time Z". Is it inappropriate to calculate this by adding up all the counts of the kinases in the genome and dividing it by the sum of all the normalized counts from DESeq2? Or do I need to recalculate TPMs to achieve this?

2) What can I conclude from the DESeq2 normalized counts? In the same sample, if gene A has 10,000 counts, and gene B has 10 counts, would I be able to conclude that abundance of gene A > gene B, but not necessarily be able to specify by how many fold?

Thanks in advance!

rna tpm rnaseq deseq2 • 2.0k views

ADD COMMENT • link updated 2.2 years ago by seidel 11k • written 2.2 years ago by jac • 0

score 0 · Answer 1 · 2022-02-24

The transcriptome is molecules, but read counts are observations of a molecule, and the probability of an observation takes two things into account: (1) the concentration of the molecule (transcript abundance), and (2) your probability of getting reads from that molecule (affected by transcript length).

Thus for your first question (1) Yes, it would be inappropriate to compare normalized counts of a class of molecules to make statements about proportions of molecules.

For your second question (2) consider a thought experiment: gene A 10,000 counts, gene B 10 counts. You want to say something about their relative abundance. Imagine that gene A is 10 kb long, and gene B is 10 bases long. Can you say anything convincing about their relative abundance?

TPM takes gene length into account, so if you want to compare molecules within a sample, or fractions of molecules in a sample, go ahead and calculate TPM. It's not hard.