I have a 16S rRNA dataset that I am analyzing in QIIME2. The samples are from a diet intervention study comparing dozens of people who consumed either a control diet or an experimental diet. Two stool samples were collected from each individual prior to the diet intervention, and two more stools samples were taken at the end of the intervention. All four samples from each individual were included in the sequencing run. My goal is to describe the diet-mediated changes to gut community diversity and identify taxa that are differentially abundant between the two diet groups.
I am not sure how to handle the two replicates at each time point. Initially I left all the samples in, but I worry about treating each replicate as independent of each other when it comes to stats (such as PERMANOVA). I also removed a replicate from each time point and repeated the analysis, and this somewhat recapitulates the results from my initial analysis. Additionally, I have considered adding the counts from each replicate within each time point, but I haven't seen others doing this and I wonder how legitimate that approach is.
My questions: What would be the best way to deal with this setup for statistical purposes? Is it 'bad' to treat the replicates as independent of each other? Are there examples in the literature of people addressing a similar situation?
I appreciate any feedback you can provide!