6.9 years ago by
I swear if someone figured out an equation for this they'd have a very well sited paper. I'm in the mist of performing RNA-seq analyses and one of the initial questions we had when we started was the same as yours. I work in yeast, so I sorta just figured if we use 35 bp reads (ABI SOLiD) how many would it take to cover the whole transcriptome, then how much depth we would want. It ended up not even mattering because I had so much rRNA contamination (we couldn't use the ribominus kit due to the way we were extracting the RNA) that we ended up with only 5% of what we expected. It was actually enough to get some useful data, however we are repeating the experiment to get more reads in order to be sure what we found was true. I would read lots of papers similar to what you want to do in order to have a rough estimate. Also, if you know of certain genes that are supposed to be up/down regulated that would serve as a nice positive/negative control.
I mentioned I work in yeast, however I have come across mice RNA-seq papers that definitely didn't use the method I used to make an estimate, because they used maybe twice the reads I used when the transcriptome should be much larger than 2x. Also you say you want to compare it to a microarray data set. If you're referring to an expression array and not a tiling array you may need only minimal depth to just know whether a gene is on or off. Sometimes papers use the absolute max number of reads because they want to know where the transcript starts and stops but you may not need that type of coverage.