Question: How To Check For The Saturation Of The Library ?
1
gravatar for Ashutosh Pandey
6.7 years ago by
Philadelphia
Ashutosh Pandey12k wrote:

Dear All,

It may be a trivial question but what would be a best way to know if resequencing of a transcriptomic library at a higher depth will generate extra results. Let's assume I have a library that was sequenced at a depth of 5 million reads. The number of alignment with non-unique start sites (PCR duplicates) is around ~40%. Now I want to know if resequencing the same library at a depth of 20 million reads will add new results to the already existing ones that I generated from the run containing 5 million reads. I wish to know is it worth to pay an extra money if it doesn't add any new information in the results. I can perform the following comparative analyses after running the same library at a depth of 10 million reads. The analyses would compare the following results from the two runs:

1) Compare the number of expressed genes (>10 RPKM) in sample with 5 million reads and 10 million reads. If I find a substantial increase in the number of expressed genes, then running the library at a higher depth will make sense. Similarly, I can also look at the number of deferentially expressed genes between condition 1 and 2 in Sample with 5 million reads and Sample with 10 million reads.

2) Similar analysis as above but for spliced junctions. If I can find substantial increase in number of reads aligning on exon-exon junctions that may be useful.

3) I can combine the two runs and check if the rate of PCR duplicates stays the same (~40%) and doesn't shoot up dramatically, then I may be adding newer reads.

Feel free to comment or add your suggestions. Also, if there are some good reviews about the same somewhere, please post them here.

rna-seq library • 2.1k views
ADD COMMENTlink modified 6.7 years ago by Charles Warden7.9k • written 6.7 years ago by Ashutosh Pandey12k

Just want to add that we are interested in splice junction discovery too.

ADD REPLYlink written 6.7 years ago by Ashutosh Pandey12k
2
gravatar for Charles Warden
6.7 years ago by
Charles Warden7.9k
Duarte, CA
Charles Warden7.9k wrote:

I think PCR duplicates are hard to deal with in RNA-Seq data, but I would say you generally want 10 million reads. After that, I think replicates are more important than coverage.

You can see some more detailed statistics in this article:

http://www.ncbi.nlm.nih.gov/pubmed/24319002

ADD COMMENTlink written 6.7 years ago by Charles Warden7.9k

Thanks for the paper.

ADD REPLYlink written 6.7 years ago by Ashutosh Pandey12k

Sure, no problem.

Splice junction discovery will be a bit of another story. Unlike gene expression (which I think is OK for single-end), you'll want paired-end (and/or longer read) data and higher coverage (perhaps starting with 20 million reads? not as sure in that case).

ADD REPLYlink written 6.7 years ago by Charles Warden7.9k

I have heard that paired end is better for splice junction discovery. Is it only because the paired-end reads can be mapped more confidently than the single end read? Because single-end reads can be soft clipped by the aligner too.

ADD REPLYlink written 6.7 years ago by Ashutosh Pandey12k

In practice, I know that MATS could provide splicing events for the same sample when processed with a paired-end library but it shouldn't call any events for that same sample with a single-end library.

I think it is an issue with being able to confidently identifying the mapping for fragments of a 100 bp read. It might have been a different story if I had access to 300 bp reads. So, I think the short answer is "yes".

ADD REPLYlink written 6.7 years ago by Charles Warden7.9k

thanks. it would be great if you know a reference paper or if you come across a reference paper that talks about inefficiency of single end read to detect splice splice junctions, please let me know. Thanks.

ADD REPLYlink written 6.7 years ago by Ashutosh Pandey12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1698 users visited in the last hour