Question: Metagenome Read Simulators
14 months ago
London, UK
alesssia520

Hello everyone,

I need to simulate a few metagenomics samples and I was thinking to use BBMap's simulator,, in "metagenome" mode.

The problem is that I need to simulate the effect of PCR amplification, and I thought of including identical duplicates but with different quality scores.

Does anyone have any suggestion on how to do this? Or knows if BBMap already simulates PCR duplicates (and if yes, how do I modify their proportion)?

Thanks a lot!

It probably does not simulate PCR dups. You could go the route you mention above.

coverage=X will automatically set "reads" to a level that will give X average coverage (decimal point is allowed).

metagenome will assign each scaffold a random exponential variable, which decides the probability that a read be generated from that scaffold. So, if you concatenate together 20 bacterial genomes, you can run randomreads and get a metagenomic-like distribution. It could also be used for RNA-seq when using a transcriptome reference.

The coverage is decided on a per-reference-sequence level, so if a bacterial assembly has more than one contig, you may want to glue them together first with before concatenating them with the other references.

Thanks, I think I will just generate the metagenomics samples, then randomly select varying percentage of reads, duplicate them, and finally shuffle their quality score. Will this work in your opinion?

You could check if there are PCR dups after generating the metagenome (using A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files ). If not then proceed with your plan as stated above.

Good point. Since I need to keep the percentage of duplicates equal to fixed values I should first check if there are any and add/remove if needed! Thanks!

