Question

RNA-Seq sequencing depth information from NCBI

0

Entering edit mode

10.0 years ago

alittleboy ▴ 220

I am working on several datasets downloaded from NCBI SRA. For example, the zebrafish dataset accessible from here is used for my analysis. However, I am not sure how can I obtain the sequencing depth of this experiment from some meta-data (say, the SraRunInfo.csv file). I summarized the data into a read count matrix, and calculate column sums:

Gata4.1       Gata4.2     Gata4.3     WT.1         WT.2          WT.3 
16779626    37906530   12574486   19009161   21119399   15302189

It seems that different samples have different sequencing depths... is that correct? So, what is (are) the sequencing depth(s) for this experiment?

Thanks!

RNA-Seq seqencing-depth • 4.1k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by alittleboy ▴ 220

Ram · Answer 1 · 2014-04-18

0

Entering edit mode

10.0 years ago

Devon Ryan 104k

It's RNAseq, why do you care about depth, which will vary by orders of magnitude between different genes? Yes, there will be different numbers of reads for each sample, that's completely normal.

A good estimate of the average depth (if you really wanted to calculate it) would be the numbers you already calculated times the average read length (I'm assuming you did some trimming) divided by whatever the transcriptome size is that you used (realistically, you'd probably whittle down the transcripts by excluding those with no or very few reads in all of the samples, so perhaps "effective transcriptome size" instead).

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Thankyou Devon for your comments. I think sequencing depth in RNA-Seq is an important aspect that biologists/statistician must be aware of. For example, in this paper: http://www.ncbi.nlm.nih.gov/pubmed/24319002, the author discussed sequencing depths and replications in RNA-Seq experiments. The data they analyzed can be accessed here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51403.

Now, for that dataset, I summarized the read counts, and the data having 30 million sequencing depth have column sums:

25578855 25820716 25741936 25179460 25896403 25558230 26002218 25561605 25318388 26253440 24026599 25906691 26154717 25869477

... and the data having 5 million sequencing depth have column sums:

4270832 4301926 4287752 4189579 4314886 4260306 4336277 4261486 4217501 4372894 4002337 4312077 4360152 4317814

Obviously we see the difference. My question is, how can we know the sequencing depth information from NCBI, I mean, from some meta-data information? It seems that different samples may have different depths (at least from other datasets I checked). Thanks again for your suggestions.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by alittleboy ▴ 220

0

Entering edit mode

They discuss read number, which is only vaguely related to depth. Read number is good to know and you already calculated it. Read depth is a largely meaningless concept in RNAseq.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Also, while you can get raw read counts from meta data, that's not what's interesting. The interesting number is the number of aligned reads, which won't be available generally in SRA or GEO since it contains only the raw reads.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for your reply. I think I processed the SRA files in NCBI following some pipeline (e.g. http://www.nature.com/nprot/journal/v8/n9/full/nprot.2013.099.html ), and use HTSeq to do the count summarization. I think that's the typical data for what statisticians usually work on for the methodological development. Correct me if I am wrong, but I think those count-based approaches based on read counts table are, in some sense, suitable for DE analysis in RNA-Seq.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by alittleboy ▴ 220

0

Entering edit mode

Yes, the counts from htseq-count are directly useable.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k