Hi all,
I am planning to investigate rare mutations of a few of unknown viral genomes by RNA-seq. The total RNA of this virus is 12kb and my supervisor advised me to do an ultra-deep sequencing (preferably as deep as 10,000x). We plan to prepare the libraries using TruSeq stranded total RNA kit and perform the sequencing by NextSeq 500. My question is what read length should I use? a 2x75 or 2x150? I tried to calculate using the formula: coverage=(read count*read length)/total genome size, but, I wasn't sure if I did it correctly.
Here are my calculations:
In ideal situation: Coverage=10,000x Kit I use will be high-throughput kit, and hence, the read count generated would roughly be:800 millions bp Total genome size:12kb (12,000bp)
coverage=(read count*read length)/total genome size
10000=(800x10e6)read length/12000 Read length= [1000012000]/800x10e6 = 0.15 The calculation doesn't seem right to me. I sincerely hope you guys can help me out in this as I am new to RNA-seq. Thank you in advance.
Since the viral sequences will not be present standalone (I assume you will have some other genome contaminating the sample) you would need to account for that in your calculation. So if you expect only 25% of your data to be viral then running the sample on NexrSeq may not be out of ordinary.