Metagenome Read Simulators
1
2
Entering edit mode
7.4 years ago
dlawre14 ▴ 30

I've been doing some research into simulators for simulating metagenomes and I cannot find a good consensus on what to use. The two big ones I've seen are MetaSim and BEAR. Does anyone have experience with these or others or have any advice for a good meta-genome simulator?

metagenomics simulator reads • 3.1k views
ADD COMMENT
2
Entering edit mode
7.4 years ago

BBMap's simulator, randomreads.sh, has a metagenome mode; just add the flag "metagenome". E.g.

cat bug1.fa,bug2.fa,bug3.fa > bugs.fa
randomreads.sh ref=bugs.fa out=reads.fq reads=10m len=150 paired metagenome

It also has another tool, "mutate.sh", to create strains from a reference with slight differences. This can be useful when simulating metagenomes.

ADD COMMENT
1
Entering edit mode

[As of BBmap v. 36.59] randomreads.sh gains the ability to simulate metagenomes.

coverage=X will automatically set "reads" to a level that will give X average coverage (decimal point is allowed).

metagenome will assign each scaffold a random exponential variable, which decides the probability that a read be generated from that scaffold. So, if you concatenate together 20 bacterial genomes, you can run randomreads and get a metagenomic-like distribution. It could also be used for RNA-seq when using a transcriptome reference.

The coverage is decided on a per-reference-sequence level, so if a bacterial assembly has more than one contig, you may want to glue them together first with fuse.sh before concatenating them with the other references.

ADD REPLY
1
Entering edit mode

OMG, BBmap can do anything!

ADD REPLY
0
Entering edit mode

I hadn't thought of BBMap... I swear that thing does everything now. Thank you!

ADD REPLY
0
Entering edit mode

@Brian Bushnell : I haven't used BBMap before. But can you please tell how it works when I'm trying to make a simulated metagenome from 10 whole bacterial geneomes and want different abundances of each genome in the metagenome. Thank you so much.

ADD REPLY
0
Entering edit mode

You would do something like this:

cat bacteria1.fa bacteria2.fa (and so forth) > all.fa
randomreads.sh ref=all.fa out=reads.fq len=150 paired reads=10000000 metagenome

Then it will generate reads with different coverage for each sequence in the reference.

ADD REPLY
0
Entering edit mode

Thanks for the reply Brian. Is there a way to know the coverage of each bacteria in my metagenome? Thank you.

ADD REPLY
0
Entering edit mode

Is there anyway to run this so that it's species/isolate aware? The model would be more representative of real communities if it assigned probabilities to species/isolates.

ADD REPLY

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6