Question

NAT Vs T samples DA analysis

0

Entering edit mode

2 days ago

San • 0

Hello, I am using a study's data which has NAT and T samples. My metadata is mixed that is I have both paired patient( patint w both NAT AND T samples) and unpaired i.e either NAT or T samples from a patient in my clinical metadata.

If i want to use these 4 DA tools (Deseq2, Aldex2, Ancom and Maaslin2) and make the analysis comparable with sig being padj<0.05, how can i make the deseign since I can only add Patient ID as random effect for Maaslin2 and Ancom, but idk how to model it for the rest of the two tools

I tried subsuetting and using the 4 tools and only deseq gave me sigf outputs and other tools had no <0.05 outputs. But maybe i am doing sth wrong.

My question is if i proceed using the 4 tools and not adding it as effect for some tools which do not accept it, I may not have a comparable output since different tools have different designs.

This is my first time working with microbiome data. so idk how to handle a clincial metadata with paired and unpaired patients.

Thank you

abundance differential microbiome • 392 views

ADD COMMENT • link updated 1 day ago by Kevin Blighe 89k • written 2 days ago by San • 0

0

Entering edit mode

What's your N for paired patients (those with both NAT & T samples)? It's crucial for tweaking the models:

If low (<10-15), power's limited; ANCOM/Maaslin2 might stay conservative (few sigs at padj<0.05), while DESeq2 (using fixed effects) could overstate hits. Consider boosting with covariates or sensitivity runs.
If higher (>30), random effects across all tools will balance things nicely—e.g., (1|patient_id) in ANCOM/Maaslin2, fixed ~patient_id + condition in DESeq2.

With N=20ish, you'd expect 5-20% sig taxa if effects aren't tiny. Share yours (and maybe total samples/depth)? I can paste-ready code snippets. First-time microbiome?

ADD REPLY • link 2 days ago by Kevin Blighe 89k

0

Entering edit mode

Thank you for your reply, Kevin!

Total samples: 506

Unique patients: 351

Paired patients: 154 , these have both Tumor and NAT (Normal Adjacent Tumor) samples

Unpaired patients: 197, these have only one group (either Tumor or NAT)

For differential abundance (species-level, after 10% prevalence filtering), I’m using four tools with the following designs:

MaAsLin2: abundance ~ Group + Center + (1 | Patient_ID)

ANCOM-BC2: ~ Group + Center + (1 | Patient_ID)

DESeq2: ~ Group + Center

ALDEx2: ~ Group + Center

Across 81 taxa, I’m getting fewer than 10 significant taxa (padj < 0.05) from most tools, and sometimes only 1–2. I’m wondering if the design might be too conservative or not appropriate for this type of mixed metadata.

I initially thought about using a fixed-effect design in DESeq2, like ~ Patient_ID + Group, but since over half my patients are unpaired, that leads to a non–full-rank model.

Would it make more sense to:

Restrict DA analysis to paired patients only (so I can use ~ Patient_ID + Group consistently across tools that allow pairing), OR treat all sample as independed and ignore the pairing? or

Keep the full mixed dataset and accept that only some tools (MaAsLin2 / ANCOM-BC2) can model Patient_ID as a random effect while others can’t?

I’d really appreciate your thoughts on what’s generally considered best practice for differential abundance analysis in mixed datasets like this (partly paired, partly unpaired).

ADD REPLY • link updated 2 days ago by GenoMax 154k • written 2 days ago by San • 0