Question

Using the same reads to assemble mitochondria and chloroplast genomes

0

Entering edit mode

8.3 years ago

novice ★ 1.1k

I have paired end reads and two closely related references, one for mitochondria and another for chloroplast. Can I simply map the reads to the mitochondria reference (with bwa) to assemble the mitochondrial genome, then map them to the chloroplast reference to assemble the chloroplast genome? Is there any problem with this approach? I'd appreciate some ideas/suggestions from someone who'd done something similar before.

Assembly Mapping mtDNA • 2.3k views

ADD COMMENT • link updated 8.3 years ago by Brice Sarver ★ 3.8k • written 8.3 years ago by novice ★ 1.1k

Ram · Accepted Answer · 2015-12-30

2

Entering edit mode

8.3 years ago

Brice Sarver ★ 3.8k

Remember that mapping is not necessarily assembling a genome - you're just placing the reads on that reference. You could use your mapped data to 1) call a consensus sequence or 2) call variants and inject them back into the reference. If your references are closely related and lack repetitive elements, you should be fine in calling variants and using these to identify sample-specific variation and references.

Additionally, you can attempt to assemble these de novo. In this case, you'd take your assembly and BLAT it to your mitochondrial/chloroplast reference to identify which contig(s) are from these sequences. However, I'd recommend an iterative assembly approach, like the one implemented in ARC, which I've used to do exactly what you want to do.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Thank you Brice. ARC sounds like the tool I need. Do you know if it would deal with multiple copies of the genome within the reads? I expect there to be several copies of mitochondrial/chloroplast genomes.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by novice ★ 1.1k

1

Entering edit mode

Yes, that should work fine. That basically means that the mitochondrion/chloroplast will have higher coverage relative to nuclear markers. ARC, in particular, will find reads that are similar to either, split them into pools, and attempt to assemble those pools de novo. If you expect heteroplasmy, this is a somewhat more complicated question informatically; the short answer is you will be able to recover distinct haplotypes given enough variation and coverage.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Hi, Brice, I want to know whether ARC work well on the big data set about 80G, which has mixed mitochondrial and chloroplast genome reads?

ADD REPLY • link 8.0 years ago by ant_genome • 0

0

Entering edit mode

80G of raw data? I see no reason why it wouldn't perform well. Arguably the most time-consuming step is the read splitting, so this might take a bit of time, but it was designed to handle datasets like this. I'd give it a try. At the very least, it will do better than a complete de novo since you know what you're looking for.

ADD REPLY • link 8.0 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

What should I consider if my reference sequence has repetitive elements? The chloroplast reference genome has two repetitive elements, IRa and IRb.

(I'm trying to assemble the chloroplast genome of the purple maize)

Thanks for the recommendation! I'll try ARC too :D

ADD REPLY • link 4.9 years ago by macielrodriguez2 ▴ 50