Question: How to separate plastid, mitochondrial, and nuclear genomes from whole genome sequencing data for plants?
gravatar for Vincent Manzanilla
6.0 years ago by
Vincent Manzanilla20 wrote:

My dear nerd community,

We are sequencing many genomes of non-model plants today.

One of the first task is to separate the plastid, the mitochondrial and the nuclear genome. I was thinking there was general guidelines or even a consensus way to do it, but not really

I have to specify, in this case we do not have any close related reference genomes (nuclear or plastid or mitochondrial), we like challenges!

So what we do in our lab:

1. Download from NCBI one file for all plastid and the all mitochondrial genomes

2. Use mirabait tool from MIRA package to separate the different genome

3. Do your assembly for your plastid genome or nuclear genome

Do you have a better way to do it? Because this method is not that perfect.




ADD COMMENTlink modified 6.0 years ago by Brice Sarver3.6k • written 6.0 years ago by Vincent Manzanilla20
gravatar for matted
6.0 years ago by
Boston, United States
matted7.3k wrote:

The mirabait tool looks promising, and maybe you could get good results if you started with a wide enough net of possible mitochondrial sequences.

I would also suggest looking at read coverage as a way to separate these genome sources.  It varies by species and cell type, but typically the mitochondrial genome has a much higher copy number than the nuclear genome.  You could do a de novo assembly on all reads together, then try to stratify the contigs into your classes by their average read coverage.  You could also attempt an assembly-free approach by doing this at the k-mer level.

There will be some ambiguity caused by repetitive regions, but you may be able to make progress with this approach.  It has the advantage that it does not depend on finding accurate template/related mitochondrial sequences first, so if that task is hard or impossible, it may help.  Conversely, if you can find good mitochondrial examples, I think the template approach will be hard to beat.

ADD COMMENTlink written 6.0 years ago by matted7.3k
gravatar for Brice Sarver
6.0 years ago by
Brice Sarver3.6k
United States
Brice Sarver3.6k wrote:

Assembly by Reduced Complexity will take a set of (annotated!) targets and produce unbiased assemblies. The paper is in review and available on bioarxiv. Our motivation for developing ARC was to do exactly what you want to do: assemble targets for which no close reference sequence is available. It's been great and extremely helpful for non-model genomic work.

FYI: I'm sure you're aware that (some?) chloroplasts have an inverted, duplicated region - assembling these regions are notoriously difficult, and it would make sense to mask one/both if the insert size of your libraries does not span it.

ADD COMMENTlink written 6.0 years ago by Brice Sarver3.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2157 users visited in the last hour