Hello, I am using a study's data which has NAT and T samples. My metadata is mixed that is I have both paired patient( patint w both NAT AND T samples) and unpaired i.e either NAT or T samples from a patient in my clinical metadata.
If i want to use these 4 DA tools (Deseq2, Aldex2, Ancom and Maaslin2) and make the analysis comparable with sig being padj<0.05, how can i make the deseign since I can only add Patient ID as random effect for Maaslin2 and Ancom, but idk how to model it for the rest of the two tools
I tried subsuetting and using the 4 tools and only deseq gave me sigf outputs and other tools had no <0.05 outputs. But maybe i am doing sth wrong.
My question is if i proceed using the 4 tools and not adding it as effect for some tools which do not accept it, I may not have a comparable output since different tools have different designs.
This is my first time working with microbiome data. so idk how to handle a clincial metadata with paired and unpaired patients.
Thank you
What's your N for paired patients (those with both NAT & T samples)? It's crucial for tweaking the models:
(1|patient_id)in ANCOM/Maaslin2, fixed~patient_id + conditionin DESeq2.With N=20ish, you'd expect 5-20% sig taxa if effects aren't tiny. Share yours (and maybe total samples/depth)? I can paste-ready code snippets. First-time microbiome?
Thank you for your reply, Kevin!
Total samples: 506
Unique patients: 351
Paired patients: 154 , these have both Tumor and NAT (Normal Adjacent Tumor) samples
Unpaired patients: 197, these have only one group (either Tumor or NAT)
For differential abundance (species-level, after 10% prevalence filtering), I’m using four tools with the following designs:
Across 81 taxa, I’m getting fewer than 10 significant taxa (padj < 0.05) from most tools, and sometimes only 1–2. I’m wondering if the design might be too conservative or not appropriate for this type of mixed metadata.
I initially thought about using a fixed-effect design in DESeq2, like ~ Patient_ID + Group, but since over half my patients are unpaired, that leads to a non–full-rank model.
Would it make more sense to:
Restrict DA analysis to paired patients only (so I can use ~ Patient_ID + Group consistently across tools that allow pairing), OR treat all sample as independed and ignore the pairing? or
Keep the full mixed dataset and accept that only some tools (MaAsLin2 / ANCOM-BC2) can model Patient_ID as a random effect while others can’t?
I’d really appreciate your thoughts on what’s generally considered best practice for differential abundance analysis in mixed datasets like this (partly paired, partly unpaired).