Hello all,
Anyone has a procedure for creating mock metagenomic datasets, or any resources where I can find already made datasets.
Hello all,
Anyone has a procedure for creating mock metagenomic datasets, or any resources where I can find already made datasets.
A general way of making a mock metagenome is to collect all individual genomes of interest, have them represented in a certain ratio to simulate abundance, and create simulated sequencing reads. Googling should give you some ideas how to do it.
If you are interested in existing datasets, I suggest you read these two papers and consult their resources:
If you're still looking, the NIAID Data Ecosystem is a discovery portal for reusable datasets: https://data.niaid.nih.gov/ It integrates datasets from infectious and immune mediated disease repositories in one searchable place and there are thousands of metagenomic datasets. Just enter "metagenomes" to get metagenomic datasets. You can filter the results by type of metagenomes (use the Pathogen filter) to filter for more specific types (e.g. "gut metagenome") or filter by the "source" to narrow down your search. For example, if you're interested in raw sequences, then you might want to filter for metagenome related datasets coming from SRA. If you're interested in just looking for sources of metagenomic datasets, you can look at the sources filter for ideas. In any case, good luck.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you @Mensur Dlakic ,
This is very helpful