Shred a genome to short reads
1
0
Entering edit mode
8.0 years ago
merwright • 0

I'm working on assemblies and wanted to create a mock metagenome assembly. My research is using biochem approaches to enrich eukaryotic DNA, followed by enriching computationally to get the genome.

Question: I would like to take a small, complete eukaryotic genome (approximately 20-40Mb), along with several complete bacterial genomes (~5Mb), and shred these all into 100-250bp (random) fragments.

That means each genome would be shredded 10-20 times randomly and independently so overlaps are available. All separate files would be merged into one mock fasta file simulating a NGS library that has been cleaned and ready for assembly.

I've tried searching for "genome shredding" and other derivatives for several weeks. Can anyone suggest software that would have this partially done, or some kind of framework for me to code this? This is the process I have thought of so far:

  1. Input file is one line of complete, assembled genome
  2. Each shuffle is composed of selecting a number between 100-250, taking that number of nucleotides and writing into new file with a fasta format of:

>random1

>ATATATATA (sequence)

>random2

>GCGCGCG (sequence)

  1. 10 separate fasta files of each organism are all cat > mocksequencing.fasta

I feel like this isn't too complicated or out of the norm for a lot of studies, so writing this myself is a bit redundant. Is this somewhere in BioPython Documentation? Thanks!

genome • 2.1k views
ADD COMMENT
0
Entering edit mode
8.0 years ago

I believe the term you're looking for is read simulator.

Take a look at Metasim. wgsim may also be of interest.

And a relevant Biostars post that includes other suggestions: What Ngs Read Simulators Are Available For Paired-End Data?.

ADD COMMENT
0
Entering edit mode

Thank you! Knowing what these are commonly referred to is a huge help. I will look into that.

ADD REPLY

Login before adding your answer.

Traffic: 1793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6