What are the recommended methods for merging de-novo assemblies?
1
3
Entering edit mode
5.5 years ago
O.rka ▴ 710

I have several fasta files of assemblies from different samples. Is there a recommended method for merging de-novo assemblies? I have 88 assemblies and 2 of which are of seemingly high quality. The quast results are shown below for all of the assemblies. Many of the reads are short and I feel like they could be collapsed down from some of the contigs from the other assemblies. Are there recommended methods for collapsing all of the assemblies into a consensus assembly? I stumbled across metassembler but haven't heard much about it. Any advice would greatly appreciated.

enter image description here

Assembly sequencing • 4.0k views
ADD COMMENT
0
Entering edit mode

Do you have the original fastqs for all the assemblies?

ADD REPLY
0
Entering edit mode

Yes, I have the original fastq files which include R1, R2, and singletons.

ADD REPLY
0
Entering edit mode

Yep, Damiens answer was where I was headed. Just concatenate all you R1s and all your R2s and assemble that.

Depending on the quality, you may not want to use all of them. If you have some which are lower quality, there’s no point ‘tainting’ your other reads with them.

88 lots of fastqs may also lead to too much depth, in which case you may need to downsample.

ADD REPLY
0
Entering edit mode

Do you have 88 assemblies of the same organism?

ADD REPLY
0
Entering edit mode

Yes, I have 88 de-novo assemblies based on the same reference.

ADD REPLY
1
Entering edit mode

I have several fasta files of assemblies from different samples.

It is implied, but it is not clear to me, that you have 88 sequencing samples, but these samples are all from the same reference. Could you clarify your experimental design?

I will list some possibilities from my incomplete understanding of your question:

Do you have one fastq dataset, which you assembled several times, with different methods and parameters, to get 88 assemblies?

Or do you have 88 fastq datasets, all from different isolates of the same organism, and assembled these 88 isolates separately?

Or (finally) do you have 88 fastq datasets, all from the same isolate, and assembled these 88 isolates separately?

ADD REPLY
0
Entering edit mode

Yes, I have 88 different metagenomic samples. I've mapped to a collection of taxa that I'm interested in (very small list) to get all species in the genus. I am assuming that there will be slight differences in community from each sample. I've tried coassembling these but as mentioned in another commenter, I had to subsample and that decreased the amount of data I had for assembling rare species. I have a feeling that there will be overlaps in contigs among different sample sets. Are there any tools that can look for this and extend contigs that have high overlap?

ADD REPLY
0
Entering edit mode

Try some of these perhaps?

https://omictools.com/assembly-reconciliation-category

I can’t personally speak to any of them as I’ve never tried what you’re attempting.

ADD REPLY
0
Entering edit mode

is there any particular reason why you constructed 88 assemblies? as Damian Kao says, why not just simply do one assembly with all data?

ADD REPLY
0
Entering edit mode

Hi O.rka,

Have you tried Metassembler on your datasets ?

I assembled 7 different genomes(fish) with differ kmer values (Abyss, Spades, Velvet).

Using CD-HIT remove the redundant contigs.

Generated the less redundant contigs,

Can I use Metassembler to merge the assemblies ?

ADD REPLY
0
Entering edit mode

You're definitely going to get an answer if you merge the assemblies but TBH I steer clear of this type of workflow because it can introduce artifacts if not done very carefully. I haven't tried meta-assembler yet so I can't comment.

ADD REPLY
0
Entering edit mode

Check out this, IMAP. https://github.com/jkimlab/IMAP.

ADD REPLY
1
Entering edit mode
5.5 years ago

You have the original fastqs for all 88 assemblies and they are all of the same organism. You can just perform a de novo assembly of all the raw fastqs instead of trying to merge the 88 assemblies.

ADD COMMENT
0
Entering edit mode

I kind of mentioned it in a comment above (after you posted this) but to do this I had to subsample and I lost a lot of information. I mainly want to group contigs from different assemblies that overlap and then extend them. I have a lot of small contigs and I wonder if they can be treated like reads?

ADD REPLY
1
Entering edit mode

So you have 88 sets of metagenomic reads? I don't see why not just merge them all and treat them as one huge metagenomic read set and assemble that with metagenome aware assembler (metaSpades or something). And from the resulting assembly, pull out the contigs you want based on reference mapping.

ADD REPLY
0
Entering edit mode

This doesn’t seem like a good idea to me because you’ll be introducing variants from different assemblies. You’ll be losing information in some areas and adding erroneous information in others.

ADD REPLY
0
Entering edit mode

You're probably right. I'm trying to figure out the best way to handle all of these small contigs. I ran them through CheckM and there are definitely markers on them. It also makes the binning a lot more complicated but I can't throw them out.

ADD REPLY
0
Entering edit mode

If you’re not looking to close the assemblies, and the contigs pass reasonable quality control, you can just leave them.

Alternatively, scaffold your contigs so that your small contigs are at least positioned within a final large contiguous assembly. You won’t gain more information, but you’ll make a contiguous sequence at least.

ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6