Question

Difference between abundance and counts in RNA-Seq

9

Entering edit mode

5.1 years ago

c_u ▴ 520

Hi,

I have been trying to find the difference between the above two online for a while now, but I haven't got a satisfactory answer. I also didn't find a similar question on Biostars, so I thought of formally asking it now.

Tximport (and maybe other tools too) gives a couple of outputs for each gene, and two of them are - abundance and counts. What is the difference between them?

This paper gives a general idea that count based methods assign reads to genes directly, whereas abundance based methods assign abundance of each transcript with a probabilistic model that makes use of info such as fragment length distribution etc.

So, having said that, is this really the difference between the abundance and count values that I get for any gene from Tximport (or any tool in general)? And, in which situation is one of them a more meaningful/desirable quantity?

RNA-Seq tximport • 8.8k views

ADD COMMENT • link 5.0 years ago by c_u ▴ 520

1

Entering edit mode

Abundance just means a quantification of the expression level. Raw counts without any kind of normalization is not a very accurate measure of abundance, but many software tools want raw counts for input because they do their own normalization. This is probably why Tximport has both, but I don't know the exact method Tximport uses to calculate abundance. In general, abundance could be TMM-normalized counts, TPM values, or any other kind of gene expression measure.

ADD REPLY • link 5.1 years ago by colin.kern ★ 1.1k

0

Entering edit mode

Have you checked the "Use with downstream Bioconductor DGE packages" section of the tximport vignette? That part addresses this question

ADD REPLY • link 5.1 years ago by igor 13k

1

Entering edit mode

Hi igor, thanks for the response. Yes, I had gone through that section before and went through it again now, but I didn't find any clear explanation for the difference between abundance and counts in general

ADD REPLY • link 5.1 years ago by c_u ▴ 520

score 9 · Accepted Answer · 2019-10-23

A count is simply that, a count of reads on some feature. An abundance is a more biologically meaningful (though not necessarily statistically useful) quantification of expression of a gene or transcript that is normalized in some way. Most commonly in this is TPM or some variant of that, but it could also be "copies per cell", which would be an abundance metric you could get from rt-qPCR. In other words, normalized counts aren't an abundance estimate since reads aren't a thing present in the cell, but an artifact of how we perform library prep and sequencing. The exception to this would be if you use a minion or equivalent to sequence full-length transcripts, since then a normalized count would estimate the abundance (on some likely relative scale) of a transcript in a cell or tissue.