x-posted at stack exchange's 'cross calidated' here.
Currently, our samples are going to get at least 30 million reads each. Given that we have 3 biological replicates per condition, that gives us 90 million reads. Let's say we only get 50% of those aligned and only count 100bp segments (even though Illumina has paired-end, I think that only improves accuracy of mapping). That gives us 45e6 x 100 = 45e8 bp of reads.
The Drosophila exome (including non-coding genes) is 17e3 genes x Avg 6e3 bp per gene = 1e8 bp. 45e8 bp of reads / 1e8 bp coding and non-coding exome = 45x coverage. My numbers for the drosophila exome are from here: http://flybase.org/static_pages/docs/release_notes.html
One additional complication: Due to the nature of the experiment, only about 1/3 of the animal is affected by my experimental condition. My worry is that the other 2/3 will "cover-up" any gene expression changes that I might see in the affected tissue.
The question is how many genes will I actually be able to detect as differentially expressed at this 45x coverage? Am I only going to be able to detect genes with high abundance? Will increasing it to 75x help or just be a waste of money?
I was told by one professor to not worry about it and another that it is a hypergeometric test in some way. Thanks much.
p.s. I've found this tool to do power analysis with RNA-seq: http://euler.bc.edu/marthlab/scotty/scotty.php but I don't have pilot data (and don't quite know how to make simulated data) to use it with.