Hello
I'd like to know the potential (biostatistics) issues on the differential expression analysis that could come from combining replicates from low and deeper sequencing depth.
The experiment will compare conditions A and B. We plan to have 6 biological replicates of each max (and 4 replicates min).
The goal is to extract differentially expressed genes, but we do not know the optimal depth for this experiment and further. We already did a first RNA-seq with a suboptimal depth and estimated we could x4 the depth before reaching the plateau of an extrapolation curve (Preseq).
The proposition is the following:
- Prepare the 2x6 replicates.
- For each condition, sequence 3 replicates at the suboptimal depth, and sequence the 3 others with a x4 depth.
- Use the 2x3 deep-seq replicates to estimate the optimal depth in term of number of DEGs for the future experiments.
- For our current experiment, compute the DEGs by combining the low and deep-seq replicates.
A batch effect could be added to the GLM model to distinguish between the low and deep-seq replicates.
Questions:
- does it make sense?
- is combining 3 low to 3 deep-seq replicates better than 6 low-seq replicates or 3 deep-seq replicates?
- or conversely, are the low-seq replicates useless?
Thanks in advance for your time
In experimental terms, if you are sequencing the same library to different levels there should be no batch effect in terms of content. This would essentially be a technical replicate of sequencing.
What do you mean by optimal depth? Are you going to be looking for extremely rare transcripts https://www.ecseq.com/support/ngs/what-is-a-good-sequencing-depth-for-bulk-rna-seq discusses the number of reads generally recommended per sample.