Question: a question about some "basic numbers" for rna-seq experiments
gravatar for CrazyB
5.4 years ago by
United States
CrazyB220 wrote:

From my limited experience with genomic DNA NGS, I could calculate how many "genomic samples" (be it WES (~50Mb), WGS (~3Gb), or targeted sequencing) to pool for one lane in Hiseq2500, how many sequencing lanes for one project (e.g. calculating at ~ 50-80 Gb data per lane) and use the info to estimate the total cost of the experiments. But it appears that this does not translate well into experiments of RNA-seq using NGS. Could anyone provide any pointer regarding (for lack of a better word) "RNA-seq statistics" to help me estimate the cost of an RNA-seq experiment? For example,

(a) in one Hiseq2500 sequencing lane, what's the total number of reads I could get for a 2 ug mRNA with a standard cDNA PCR cycle (5' reverse transcription + 3' PCR extension)?

(b) Nature Methods 6, 377 - 382 (2009) suggested for example that with a modified cDNA PCR cycle (extending reaction time from 5' to 30' for reverse transcription and from 3' to 6' for PCR extension), the authors got 100 million 50-base reads, which means ~ 5 Gb. If we assume 50-80 Gb data per lane in Illumina Hiseq2500, does this mean with the protocol from Nature Methods people could pool 10 different samples in one lane ?

(c) what's the average transcript reads for a sample of 2 ug mRNA as starting materials?

(d) from UCSC, hg19 statistics report a total of 34702 genes. On average, how many of these 34702 genes get transcribed ? what would be the average number of transcript per gene?

Any info on any of these and any other references on rns-seq experiment would be greatly appreciated.

Thank you



rna-seq • 2.8k views
ADD COMMENTlink modified 5.4 years ago by Gary480 • written 5.4 years ago by CrazyB220

please learn to ask one question per post.

ADD REPLYlink written 5.4 years ago by Ido Tamir5.1k
gravatar for Adamc
5.4 years ago by
United States
Adamc640 wrote:

Regarding (a) and (c), some more info is required- there are a number of variables that could affect the number of reads per run. Outside of factors such as library prep and amount of input material (which I haven't been able to strongly associate with # of output reads, but that might just be since we don't run enough samples to establish a correlation there) you'll need to know about the run mode- high output or rapid run- and the version of the chemistry the instrument is set up for. There was a huge jump in the number of reads between the v3 and v4 update for the HiSeq, and with v4 we can get more high-quality 125bp reads than we could get 100bp reads on v3, for pretty much the same cost.

On (d)- from the standpoint of annotation, it's not just protein coding transcripts that get called "genes". You'll want to just count the ones that are annotated as "protein coding". You should be able to get that info through Biomart, and I imagine there's also some way to do it through the UCSC browser as well.

ADD COMMENTlink written 5.4 years ago by Adamc640
gravatar for Ido Tamir
5.4 years ago by
Ido Tamir5.1k
Ido Tamir5.1k wrote:

You can expect 250-300M reads per lane on a HiSeq2500. This is independent of the input to the library preparation step. Encode guidelines talk about 30M reads per experiment, so 8-10 samples per lane are poolable by this standard.




ADD COMMENTlink written 5.4 years ago by Ido Tamir5.1k
gravatar for Gary
5.4 years ago by
Taiwan/Taichung/China Medical University Hospital
Gary480 wrote:

In our lab, we pool 6 samples in one lane, but it is case by case. The best way is to see recently published papers that are very similar to your biological questions and experimental design. In addition, you can ask technicians in the sequencing center you will send samples to sequence. In fact, not only the price, but also the number of reads per lane is various between different sequencing centers.

ADD COMMENTlink written 5.4 years ago by Gary480
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1523 users visited in the last hour