Hello,
I'm hoping to get some clarification regarding normalized counts and other methods of abundance.
I've used DESeq2 as the final step for my RNA-Seq analysis and was planning on using the normalized counts data as a proxy for the number of transcripts. Upon doing a bit of googling, this resource (https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html) amongst other blog posts/questions have said TPM is the most appropriate way to assess expression levels between genes within a sample.
My understanding is that the reason for this is because DESeq2 normalized counts aren't calculated based on sequencing depth and gene length.
Therefore, I was hoping to get some confirmation on:
1) I would like to be able to conclude things like "kinases make up XX% of the transcriptome in my sample of tissue Y at time Z". Is it inappropriate to calculate this by adding up all the counts of the kinases in the genome and dividing it by the sum of all the normalized counts from DESeq2? Or do I need to recalculate TPMs to achieve this?
2) What can I conclude from the DESeq2 normalized counts? In the same sample, if gene A has 10,000 counts, and gene B has 10 counts, would I be able to conclude that abundance of gene A > gene B, but not necessarily be able to specify by how many fold?
Thanks in advance!