Chloroplast and mithocondria genome assembly with SPAdes and Tadpole, correct coverage and kmers
Entering edit mode
4.6 years ago

Hi! I'm very new to this field, so I fear I'm making very dumb mistakes.

My proccess to assembly these genomes is the following:

We made a whole genome sequencing of purple maize (2x151 bp). Then I mapped all the reads agains the reference genome (10 chromosomes + mitochondria + chloroplast) using bowtie2.

Then, using samtools, I extracted the alignments (bams) only for mitochondria and chloroplast and with samtools fastq I extracted the reads mapped agains the reference mitochondria and chloroplast. I used to sort the reads by name and to have the same number of reads per fastq file.

I want to use these reads to do de novo assembly.

For chloroplast, I have a total coverage of 5600x. Doing sampling to have a 60x or 90x of coverage, and kmers of 37,47,57,67 or close, I get a highly fragmented assembly.

For mitochondria, I have a total coverage of 1400x. Doing sampling to have a 60x of coverage and kmers 47,57,67,77 I got an assembly of 46 contigs and using kmers of 45,65,85,95, I got 31 contigs.

Then I used tadpole with 100x coverage and k=100 and got 222 contigs. Also tried to extend and merge my reads and do assembly with 250x coverage and kmer= 250 but got 405 contigs.

I think that my main problem is that I'm not using the correct values of coverage and kmers.

Does anyone has some advice about this? Thanks a lot in advance :)

Thank you so much in advance for your advice

assembly genome next-gen sequence alignment • 1.1k views

Login before adding your answer.

Traffic: 2252 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6