Question: Shred a genome to short reads
gravatar for merwright
3.8 years ago by
merwright0 wrote:

I'm working on assemblies and wanted to create a mock metagenome assembly. My research is using biochem approaches to enrich eukaryotic DNA, followed by enriching computationally to get the genome.

Question: I would like to take a small, complete eukaryotic genome (approximately 20-40Mb), along with several complete bacterial genomes (~5Mb), and shred these all into 100-250bp (random) fragments.

That means each genome would be shredded 10-20 times randomly and independently so overlaps are available. All separate files would be merged into one mock fasta file simulating a NGS library that has been cleaned and ready for assembly.

I've tried searching for "genome shredding" and other derivatives for several weeks. Can anyone suggest software that would have this partially done, or some kind of framework for me to code this? This is the process I have thought of so far:

  1. Input file is one line of complete, assembled genome
  2. Each shuffle is composed of selecting a number between 100-250, taking that number of nucleotides and writing into new file with a fasta format of:


>ATATATATA (sequence)


>GCGCGCG (sequence)

  1. 10 separate fasta files of each organism are all cat > mocksequencing.fasta

I feel like this isn't too complicated or out of the norm for a lot of studies, so writing this myself is a bit redundant. Is this somewhere in BioPython Documentation? Thanks!

genome • 1.1k views
ADD COMMENTlink modified 3.8 years ago by leekaiinthesky180 • written 3.8 years ago by merwright0
gravatar for leekaiinthesky
3.8 years ago by
leekaiinthesky180 wrote:

I believe the term you're looking for is read simulator.

Take a look at Metasim. wgsim may also be of interest.

And a relevant Biostars post that includes other suggestions: What Ngs Read Simulators Are Available For Paired-End Data?.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by leekaiinthesky180

Thank you! Knowing what these are commonly referred to is a huge help. I will look into that.

ADD REPLYlink written 3.8 years ago by merwright0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1133 users visited in the last hour