Count vs Read
2
4
Entering edit mode
7.2 years ago
mhyunjunkang ▴ 110

Hi everyone,

I have a very basic question, but I still don't understand. What is difference between "count" and "read" in RNA-seq data? If I say, count = read, is it right? do I understand correct? I'm completely new at this. I look forward to your kind explanation. Thanks in advance.

Mind,

RNA-Seq • 18k views
ADD COMMENT
10
Entering edit mode
7.2 years ago
ddiez ★ 2.0k

They are not the same. A read is the oligonucleotide that has been sequenced. Counts are the number of reads that overlap at a particular genomic position. A read can map to multiple genomic positions, contributing to the counts in different ways. While the reads are inmutable (i.e. just what you obtained from sequencing), counts depend on the counting strategy (see for example an introduction to this topic in this Bioconductor course).

ADD COMMENT
0
Entering edit mode

Thank you, ddiez, for a kind explanation.

Then, can I say "count", not "read", is used to calculate RPKM (read per kilobase per million)? Did I understand your explanation correctly?

Thanks, HJ

ADD REPLY
0
Entering edit mode

Mmm, not sure about what exactly you are trying to say. Maybe this highlight in the document I linked in the answer can help. Specially the second paragraph.

The summary process tallies the number of reads aligning in each region (e.g., gene) of interest. The simplest method is to simply count reads overlapping each region, dividing by the length of the region of interest to ac- commodate differences in gene length. This is the ‘RPKM’ (reads per kilobase per million reads) of Mortazavi et al. ̃[11]. One problem with this approach is that reads are not sampled uniformly across genes (Figure ̃1; [12]), so gene length (the ‘PK’ part of RPKM) is not a good proxy for expression level.

More fundamentally, each read represents an observation, and contributes to the certainty with which a gene is measured as ‘expressed’. A summary measure like RPKM fails to incorporate uncertainty – a particular value of RPKM might result from alignment of one or 100 reads. This contrasts with a simple count of the number of reads in the region of interest. Furthermore, count data has known statistical properties that can be exploited in down- stream statistical analysis. Thus the result of summarization most useful for assessing differential expression is read count.

ADD REPLY
0
Entering edit mode

I appreciate your help....

But it gives me more confusion. "read count"..-.-?

So... according to your first comment "A read is the oligonucleotide that has been sequenced", read must be something like "AGTCGATTA....". So "read" is not number and cannot be used to calculate RPKM... Am I right?

And... "Counts are the number of reads"... I understand this part like.. if two reads (ex, "AGCTGGA" and "AGGAAGT") are mapped in Gene A, then count of Gene A is 2. So, this number "2" is used to calculate RPKM...

Did I understand correctly?

Look forward to your advice.

Thanks, HJ

ADD REPLY
0
Entering edit mode

So "read" is not number and cannot be used to calculate RPKM... Am I right?

Yes that is correct. The number of reads (i.e. the read count) is what is a number.

ADD REPLY
1
Entering edit mode

Thank you, ddiez, for the kind explanation and helping me clear that up.. HJ

ADD REPLY
0
Entering edit mode

To answer your question more directly, read counts are used to compute RPKM.

ADD REPLY
2
Entering edit mode
7.2 years ago
Charles Plessy ★ 2.9k

In addition to ddiez's answer, some RNA-seq methods use unique molecular identifiers, and in this context a "count" is sometimes a shorthand for "molecule count".

ADD COMMENT
0
Entering edit mode

Thank you for the good information... Terminology always makes me confused... Probably it's only me... HJ

ADD REPLY
0
Entering edit mode

Don't worry about it and keep reading and learning. I still remember being very confused about many terms related to NGS when just started to work on it (and I am still be confused about many things...).

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6