Mitochondrial denovo assembly
3
1
Entering edit mode
9.6 years ago

Dear all,

I am trying to generate assembly of expected genome size (~800Kb) and have done lot of things so far. Following are the details of my study:

I have 24GB+24GB of MITOCHONDRIAL R1 and R2 data respectively of 101 read length. I have tried generating contigs with Velvet and have got results also for multiple k-mer sizes. But N50 value was too low (~1000) and number of contigs were too high.

Next thing I have done is, I subsampled data from 200% to 25% removal of reads from original file as I got suggestion that I needed to lower the coverage and file size. Thereby doing this subsampling excersise, I have ran subsampled data with VelvetOptimizer for contig generation. With this I got the result with increased N50 value and decreased number of contigs substantially.

Thereafter, I further compressed contig file by using Amos tool and tried doing alignment using Bowtie2 with -x 0 and -I 500 options using Amos output fasta file (contig+singleton file). The Bowtie2 has given total alignment of ~60% in both the cases, for best subsampled (based on N50 and no of contigs) file output and also for contig file without subsampling.

Further I tried doing assembly using SOAPdenovo2, since it has Gap closing provision to create scaffolds from contigs.

But at this point of time I am at impasse over assembly task, since I am not able to how to validate these assemblies. Please suggest something that is very wise to arrive at my genome from contig files

Regards,
Mandar

Assembly • 3.8k views
ADD COMMENT
1
Entering edit mode
9.5 years ago
rtliu ★ 2.2k

If I understand your question correctly, you have 48GB / ~800kb = ~60,000 coverage, that is way too much coverage.

If you sub-sample your data at 0.1%, e.g. with ~60x coverage, Velvet usually give you a far better result. My command:

seqtk sample -s100 read1.fq 240000 > sub1.fq
seqtk sample -s100 read2.fq 240000 > sub2.fq

Other option is to apply digital normalization, as http://khmer.readthedocs.org/en/v1.1/guide.html

ADD COMMENT
1
Entering edit mode
9.3 years ago
Brice Sarver ★ 3.8k

Use an iterative assembly approach, like the one implemented in ARC: Assembly by Reduced Complexity

ADD COMMENT
0
Entering edit mode
9.3 years ago

If you can get the sequence of a similar AND trusted mitochondrial genome, you can improve your assembly by comparing with programs such as Mauve and/or Act

ADD COMMENT

Login before adding your answer.

Traffic: 1868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6