Question: What exactly is sequencing depth in RNAseq?
1
gravatar for carmacae
13 days ago by
carmacae10
carmacae10 wrote:

I am new to learning about RNAseq analysis and am confused as to what exactly sequencing depth refers to. For example, if I need to calculate "how deep each sample was sequenced" does this refer to the total number of paired end reads that came out of the sequencer OR the number of paired end reads that mapped?

depth rna-seq • 302 views
ADD COMMENTlink modified 13 days ago by Charles Plessy2.3k • written 13 days ago by carmacae10

Hi Carmacae,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

Cheers,
Wouter

ADD REPLYlink written 11 days ago by WouterDeCoster23k
3
gravatar for WouterDeCoster
13 days ago by
Belgium
WouterDeCoster23k wrote:

Depth is commonly a term used for genome or exome sequencing and means the number of reads covering each position. But that is for RNA-seq totally pointless since the coverage pattern is so uneven due to differences in expression.

More commonly, in RNA-seq the term "number of reads" is used, for example,10 million reads or 100 million reads. If all goes well a high percentage of the reads should align, ~90%. That makes the number of reads out of the sequencer not so different from the number of mapped reads. I would report the number of reads sequenced unless it's very different from what is aligned. But then you should also figure out why it's such a big difference.

ADD COMMENTlink written 13 days ago by WouterDeCoster23k

Thank you for the clear explanation! I was wondering because I've got some data to play around with in which most of the samples have around a 70% alignment rate (not too surprising as the genome quality isn't very good). In this case, say I have 30 million total reads of which 70% mapped... in this case my number of reads would be 21 million?

ADD REPLYlink written 13 days ago by carmacae10

Number of reads is still 30M out of which 70% mapped. Why the rest did not is something you could investigate. They could be rRNA, contamination or just plain who knows what (though that fraction should generally be very small).

ADD REPLYlink modified 13 days ago • written 13 days ago by genomax37k

Right. I'm asking though because I'm playing around with different packages, mostly just to learn, and for one (CQN package) it specifically asks for a vector containing "... the sizeFactors which simply tells us how deep each sample was sequenced". So in my example, would I use 30M or 21M for this?

Here's the package in case you're curious: http://bioconductor.org/packages/release/bioc/vignettes/cqn/inst/doc/cqn.pdf

ADD REPLYlink written 13 days ago by carmacae10

Those sizeFactors would only matter if there is a big difference between samles, say one sequenced to 20M reads and another to 80M reads. If the alignment fraction is similar for all - it again doesn't really matter.

ADD REPLYlink written 13 days ago by WouterDeCoster23k

Got it! My alignments range from around 69% to around 73%, so pretty similar? So for this specific example, I could set the sizeFactors to NULL and it would be fine? Thank you so much!!

ADD REPLYlink written 13 days ago by carmacae10

Seems pretty similar indeed. I don't know about this particular package so I don't know about setting it to NULL, but setting the sizeFactor to the total number of reads (sequenced or aligned - whatever) might be the most accurate.

ADD REPLYlink written 13 days ago by WouterDeCoster23k

Since they removed genes with 0 counts in all samples they are only considering those that have mapped reads. As @Wouter points out if the numbers are unbalanced across the dataset, then you would need to account for that.

ADD REPLYlink modified 13 days ago • written 13 days ago by genomax37k

Ahhhh got it. So in my case, if mine are all around 69-73%, I could set sizeFactors to NULL and be okay? Thank you so much, this is so helpful.

ADD REPLYlink written 13 days ago by carmacae10
2
gravatar for Tao
13 days ago by
Tao140
Tao140 wrote:

I want to give you a very intuitive but maybe not very accurate explanation: You can imagine each base (A/G/C/T) is a grain of rice. Suppose you sequenced a big of rice (say 100M bases), and then fill them into a very narrow but very long rice box with width = 1 grain of rice. The rice height in the box would be the sequence depth. For WGS, the length would be length of whole genome. For RNAseq, people might only consider reads mapped onto exons, and also only consider the length as union of exon length.

ADD COMMENTlink modified 13 days ago • written 13 days ago by Tao140
0
gravatar for Charles Plessy
13 days ago by
Charles Plessy2.3k
Japan
Charles Plessy2.3k wrote:

"how deep each sample was sequenced" means "how many reads (or pairs)" were sequenced for each sample. Some results will vary with the number of sequenced reads (the sequencing depth), therefore it is an important information.

ADD COMMENTlink written 13 days ago by Charles Plessy2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1357 users visited in the last hour