Question

Mitochondrial denovo assembly

1

Entering edit mode

9.6 years ago

mandar.bobade ▴ 20

Dear all,

I am trying to generate assembly of expected genome size (~800Kb) and have done lot of things so far. Following are the details of my study:

I have 24GB+24GB of MITOCHONDRIAL R1 and R2 data respectively of 101 read length. I have tried generating contigs with Velvet and have got results also for multiple k-mer sizes. But N50 value was too low (~1000) and number of contigs were too high.

Next thing I have done is, I subsampled data from 200% to 25% removal of reads from original file as I got suggestion that I needed to lower the coverage and file size. Thereby doing this subsampling excersise, I have ran subsampled data with VelvetOptimizer for contig generation. With this I got the result with increased N50 value and decreased number of contigs substantially.

Thereafter, I further compressed contig file by using Amos tool and tried doing alignment using Bowtie2 with -x 0 and -I 500 options using Amos output fasta file (contig+singleton file). The Bowtie2 has given total alignment of ~60% in both the cases, for best subsampled (based on N50 and no of contigs) file output and also for contig file without subsampling.

Further I tried doing assembly using SOAPdenovo2, since it has Gap closing provision to create scaffolds from contigs.

But at this point of time I am at impasse over assembly task, since I am not able to how to validate these assemblies. Please suggest something that is very wise to arrive at my genome from contig files

Regards,
Mandar

Assembly • 3.8k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by mandar.bobade ▴ 20

Ram · Answer 1 · 2014-10-04

If I understand your question correctly, you have 48GB / ~800kb = ~60,000 coverage, that is way too much coverage.

If you sub-sample your data at 0.1%, e.g. with ~60x coverage, Velvet usually give you a far better result. My command:

seqtk sample -s100 read1.fq 240000 > sub1.fq
seqtk sample -s100 read2.fq 240000 > sub2.fq

Other option is to apply digital normalization, as http://khmer.readthedocs.org/en/v1.1/guide.html

Ram · Answer 2 · 2015-01-20

1

Entering edit mode

9.3 years ago

Brice Sarver ★ 3.8k

Use an iterative assembly approach, like the one implemented in ARC: Assembly by Reduced Complexity

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.3 years ago by Brice Sarver ★ 3.8k

Ram · Answer 3 · 2015-01-20

0

Entering edit mode

9.3 years ago

Antonio R. Franco ★ 5.1k

If you can get the sequence of a similar AND trusted mitochondrial genome, you can improve your assembly by comparing with programs such as Mauve and/or Act

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.3 years ago by Antonio R. Franco ★ 5.1k