Question: improving de novo genome assembly
1
gravatar for xvazquezc
4.7 years ago by
xvazquezc10
Australia
xvazquezc10 wrote:

Hi,

I have a couple of fungal genomes that I'm reassembly from scratch as I didn't realise in a first time the amount of Illumina adapters still present in my reads.

I have assembled them with Velvet and iterate the k-mer length to get the optimum asembly based on the output parameters produced by abyss-fac. I recently read that the assemblies can be improved without further sequencing by at least a couple of different methods:

  1. map the reads against the assembly, extract the "properly paired" reads and reassembly them with the same kmer length. Take a look here
  2. Use specific software for this such REAPR(?)

I proceeded with #1 and while one of the genome re-assembly resulted in the exact same assembly parameters, the other changed quite a bit (top initial, bottom reassembly):

n       |n:500  |n:N50  |min    |N80    |N50    |N20    |E-size |max    |sum    |name
------  |------ |------ |------ |------ |------ |------ |------ |------ |------ |------
7290    |6860   |1122   |503    |4914   |11504  |22431  |14693  |124406 |43.73e6        |T2paper/velvet/k169/contigs.fa
10638   |8598   |1437   |500    |3980   |9021   |17417  |11669  |124406 |43.75e6        |T2paper/velvet/rek169/contigs.fa

So, the question...

  • is this step common?
  • Is there any easy way to compare them side by side or to evaluate the assemblies without relying in those numbers?

Thank you in advance,

Xabier

re-assembly assembly de novo • 1.7k views
ADD COMMENTlink modified 4.7 years ago by Antonio R. Franco4.5k • written 4.7 years ago by xvazquezc10

I think step #1 from the posted link does not refer to extracting the reads and reassembling them, but rather estimating the fragment length from Paired End reads, and then reassembling again with that information.

Edit: They do have this "Now we created a fairly good assembly, but lets see if we can do it better. Lets try to map the reads to the assembly and then only use mapped reads for another assembly.", but like others said, I don't think this will help the assembly.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Adrian Pelin2.4k
0
gravatar for Brian Bushnell
4.7 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

You should trim your adapters with an adapter-trimming tool like BBDuk first. You can get a rough evaluation of your assembly quality with tools like Quast; generally, the more long genes (1500bp+ and 3000bp+) are called, the better the assembly.

ADD COMMENTlink modified 7 months ago by RamRS27k • written 4.7 years ago by Brian Bushnell17k

That's why I'm redoing the assemblies. The first time I did it I use SolexaQA but I didn't search for the adapters.

I use Trim galore! for that. It's a quality and trim adapter.

ADD REPLYlink written 4.7 years ago by xvazquezc10
0
gravatar for Rayan Chikhi
4.7 years ago by
Rayan Chikhi1.4k
France, Lille, CNRS
Rayan Chikhi1.4k wrote:

No, I don't think that this step #1 is common. In general, reassembling using only the reads which properly mapped to contigs is unlikely to give you a better assembly.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Rayan Chikhi1.4k
0
gravatar for Antonio R. Franco
4.7 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.5k wrote:

One colleague of mine has been trying to close a 6Mb bacterial genome for years. It happened that this genome had too many repeated sequences. And this means trouble

He eventually have closed the circle by running a PacBio sequencing and running an hybrid assembly

And this is something you should be consider very seriously

ADD COMMENTlink written 4.7 years ago by Antonio R. Franco4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1604 users visited in the last hour