We did two time-series experiments with mixed microbial communities, to figure out how expression changes as a previously major respiration shift happens. We are an environmental engineering lab, and didn't really have prior experience with RNA-seq, and didn't design our experiment properly. Specifically, we made the common mistake of no/low biological replicates. I found a lot of posts about RNA-seq experiments with no replicates. This led us to decide to use edgeR for analysis, and we looked at their suggestions for dealing with no replicates. I was wondering if anyone had any insight on the best dispersion estimation. Our options are as follows:
We have two samples from the same mixed community at T=0. At that point, the two experiments were the same, so I believe they can be regarded as true biological replicates. None of the other time points really have replicates. We are considering the following for our analysis:
1) Estimate dispersion based on the two T=0 samples.
2) Identify a set of 'housekeeping' genes in the community and estimate biological dispersion from that across all of our samples.
The benefit for the first approach would be that all the transcripts would be included in the dispersion estimate. However, it only is two replicates. The second approach benefits from having many 'replicates' (16 samples in total), but not all the transcripts are included and we worry that our identification on housekeeping genes would have to hinge on transcripts with low gene count variation across the samples, and thus perhaps be a bit biased/lead to underestimate of dispersion.
What do y'all think of our options? Would two replicates be enough to get an estimate? I understand that the reliability of the analysis will be impacted by not having enough replicates for all time points.