Question: Does anyone know good open data sources for metagenomic data linked to a given condition?
I'm looking for either a database (like SRA) or even a study that provides its data that has labels associated with the data. Ideally, this would be metagenomic data (either sequences or abundance tables) in a study that has a strong link between a feature like a species and the condition being studied.

Just reaching out because I haven't been able to find any studies that have enough data for my application (implementing machine learning algorithms) - so ideally we are talking about at least 100 samples for the condition being studied (controls, maybe the same).

Any help is appreciated. Thanks guys

Hi Edward,

You can check below mentioned post. It may solve your purpose

thanks! I'll look through that post... of course I'm probably being a little too nitpicky with my search... we'll never really find ideal data in the real world, will we?

The closest databases I can think of are

The American Gut project has the quantity of data, so that might be the most interest to you for machine learning purposes. Good luck.

