Question: Crazy coverage in assembly of chloroplast
gravatar for int11ap1
3.3 years ago by
int11ap1320 wrote:

I am trying to assembly a chloroplast, which closest reference is 150K long. I have 4.5M pairs (2x100nt). This gives me a coverage of 6000X! And my assemblies are horrible (long -2 million bases- compared to reference genome of 150K, and remapping reads vs. contigs only 30% map).

Should I scale my data to 60X using digital normalization or randomly sampling X number of reads?

I took a subset of my data for having 100X and I assembled it with Velvet. When I map all my reads vs my contigs, only 35% of reads map.

What to do with this?

ADD COMMENTlink modified 3.2 years ago by nicolasdierckxsens30 • written 3.3 years ago by int11ap1320
gravatar for thackl
3.3 years ago by
thackl2.6k wrote:

The high coverage is not unusual for chloroplasts in plant data. Random sampling could work. But I would additionally run a filter on the sampled set to remove low coverage reads, something like quake for example - this will remove most of the "genomic contamination".  Then assembly it with SPAdes rather than Velvet. You can further analyse the scaffolds with Bandage and extract the cluster connecting the chloroplast and filter you contig set further. Even with good data, it will usually not assemble into a single contig, but at least 3 contigs, one for LSC, one for SSC and one copy of the inverted repeat.

ADD COMMENTlink written 3.3 years ago by thackl2.6k

low coverage reads or low coverage kmers?

ADD REPLYlink written 3.3 years ago by int11ap1320

To be exact reads composed mostly of low coverage kmers. I think bbnorm can perform kmer coverage based read binning quite efficient.

ADD REPLYlink written 3.3 years ago by thackl2.6k
gravatar for nicolasdierckxsens
3.2 years ago by
nicolasdierckxsens30 wrote:

Hi, I developed a new assembler for plastids and it should assemble the chloroplast genome in one circular contig. I will upload the assembler in the next few weeks if you would be interested:

I could already upload a beta version next week, probably some bugs, but all tests were successful. I assembled 10 chloroplasts, all in one contig and within 30 min. The high coverage is no problem for this assembler and you don't need any reference. For the paper I assembled the chloroplast of Arabidopsis and rice, they were both 100 % accurate, so you should obtain a high quality assembly. But I would recommend to subsample the file a bit because 6000X is a lot :) It will slow down the assembly and require more memory...  I can send a script for it..

ADD COMMENTlink written 3.2 years ago by nicolasdierckxsens30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1720 users visited in the last hour