2.9 years ago
zabendiks • 0

Hello,

I have a 16S rRNA dataset that I am analyzing in QIIME2. The samples are from a diet intervention study comparing dozens of people who consumed either a control diet or an experimental diet. Two stool samples were collected from each individual prior to the diet intervention, and two more stools samples were taken at the end of the intervention. All four samples from each individual were included in the sequencing run. My goal is to describe the diet-mediated changes to gut community diversity and identify taxa that are differentially abundant between the two diet groups.

I am not sure how to handle the two replicates at each time point. Initially I left all the samples in, but I worry about treating each replicate as independent of each other when it comes to stats (such as PERMANOVA). I also removed a replicate from each time point and repeated the analysis, and this somewhat recapitulates the results from my initial analysis. Additionally, I have considered adding the counts from each replicate within each time point, but I haven't seen others doing this and I wonder how legitimate that approach is.

My questions: What would be the best way to deal with this setup for statistical purposes? Is it 'bad' to treat the replicates as independent of each other? Are there examples in the literature of people addressing a similar situation?

I appreciate any feedback you can provide!

hi:) I have the similar question now. Did you find a solution? I appreciate any suggestions. Thank you!

Do not add answers unless you're answering the top level question. Use Add Comment or Add Reply as appropriate instead. Your post has been moved to a comment this time, but please be more careful in the future.

4 months ago
dago ★ 2.7k

If you want to look at taxa that are differentially abundant (DA) between groups and you have repeated measure (2 time points) you need to perform a paired tests (pre vs post). It is up to you to choose the stat test you want to use. If you use a non parametric test (e.g. wilcoxon) you simply specify that the data are paired and format the data accordingly. The same is true if you use any other test based on linear models e.g. t-test.

Just to be clear, what I just mentioned applies to a set-up like this:

Time point 1 Person 1 Person 2 Person 3 ....

Time point 2 Person 1 Person 2 Person 3 ....

So at each time point, the persons are paired, you want to look at the difference between these two time points within persons. If you provide more details, I am sure that the community can help you better.