Question

How to separate plastid, mitochondrial, and nuclear genomes from whole genome sequencing data for plants?

3

Entering edit mode

9.2 years ago

Vincent Manzanilla ▴ 40

My dear nerd community,

We are sequencing many genomes of non-model plants today.

One of the first task is to separate the plastid, the mitochondrial and the nuclear genome. I was thinking there was general guidelines or even a consensus way to do it, but not really

I have to specify, in this case we do not have any close related reference genomes (nuclear or plastid or mitochondrial), we like challenges!

So what we do in our lab:

Download from NCBI one file for all plastid and the all mitochondrial genomes
Use mirabait tool from MIRA package to separate the different genome
Do your assembly for your plastid genome or nuclear genome

Do you have a better way to do it? Because this method is not that perfect.

Best,
Vincent

plastid mitochodrial non-model nuclear de-novo • 4.0k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Vincent Manzanilla ▴ 40

Ram · Answer 1 · 2015-02-02

The mirabait tool looks promising, and maybe you could get good results if you started with a wide enough net of possible mitochondrial sequences.

I would also suggest looking at read coverage as a way to separate these genome sources. It varies by species and cell type, but typically the mitochondrial genome has a much higher copy number than the nuclear genome. You could do a de novo assembly on all reads together, then try to stratify the contigs into your classes by their average read coverage. You could also attempt an assembly-free approach by doing this at the k-mer level.

There will be some ambiguity caused by repetitive regions, but you may be able to make progress with this approach. It has the advantage that it does not depend on finding accurate template/related mitochondrial sequences first, so if that task is hard or impossible, it may help. Conversely, if you can find good mitochondrial examples, I think the template approach will be hard to beat.

Ram · Answer 2 · 2015-02-02

Assembly by Reduced Complexity will take a set of (annotated!) targets and produce unbiased assemblies. The paper is in review and available on bioarxiv. Our motivation for developing ARC was to do exactly what you want to do: assemble targets for which no close reference sequence is available. It's been great and extremely helpful for non-model genomic work.

FYI: I'm sure you're aware that (some?) chloroplasts have an inverted, duplicated region - assembling these regions are notoriously difficult, and it would make sense to mask one/both if the insert size of your libraries does not span it.