It may be a trivial question but what would be a best way to know if resequencing of a transcriptomic library at a higher depth will generate extra results. Let's assume I have a library that was sequenced at a depth of 5 million reads. The number of alignment with non-unique start sites (PCR duplicates) is around ~40%. Now I want to know if resequencing the same library at a depth of 20 million reads will add new results to the already existing ones that I generated from the run containing 5 million reads. I wish to know is it worth to pay an extra money if it doesn't add any new information in the results. I can perform the following comparative analyses after running the same library at a depth of 10 million reads. The analyses would compare the following results from the two runs:
1) Compare the number of expressed genes (>10 RPKM) in sample with 5 million reads and 10 million reads. If I find a substantial increase in the number of expressed genes, then running the library at a higher depth will make sense. Similarly, I can also look at the number of deferentially expressed genes between condition 1 and 2 in Sample with 5 million reads and Sample with 10 million reads.
2) Similar analysis as above but for spliced junctions. If I can find substantial increase in number of reads aligning on exon-exon junctions that may be useful.
3) I can combine the two runs and check if the rate of PCR duplicates stays the same (~40%) and doesn't shoot up dramatically, then I may be adding newer reads.
Feel free to comment or add your suggestions. Also, if there are some good reviews about the same somewhere, please post them here.