how many read per sample give me good coverage for soybean rna sequensing with 1.1 GB genome length. I will use Illumina rRNA depleted plant stranded mRNA library prep for cdna liberary and i like to discover rare gene express the genome of soybean was sequenced.
Sequencing depth is usually referenced to be the expected mean coverage at all loci over the target sequences, in the case of RNASeq experiements assuming all transcripts have similar level of expression.
For researchers with a fixed budget, often a critical design question is wether to increase the sequencing depth at the cost of reduced sample numbers or to increase the sample size with limited coverage for each sample.
Necessary coverage is determined by the type of study, gene expression level, size of reference genome, published literature, and best practice defined by the scientific community.
C = LN / G C : coverage L : read length N : number of read G : haploid genome length
On Hiseq 2500 high output run mode, single flow cell for 2X100bp reads : 4 billion paired end read (claimed by illumina so might expect a bit lower), 0.5 Billion per lane.
C = 2 x100 x 500 000 000 / 1 100 000 000 = 91 X
So you should have about 91X of coverager per lane (~3000$) that you now need to divide into conditions and replicates. IF you haven't heard about randomization, replication, blocking and Sampling, now is a good time to do so.
If your interest is only in discovery and you are not going to compare samples you could consider the following: prepare a library, sequence it, check if the depth is sufficient, if not sequence again. One library prep will (mostly - check your protocol) be sufficient for loading multiple times.