Question: What are the recommended methods for merging de-novo assemblies?
1
gravatar for O.rka
11 months ago by
O.rka120
O.rka120 wrote:

I have several fasta files of assemblies from different samples. Is there a recommended method for merging de-novo assemblies? I have 88 assemblies and 2 of which are of seemingly high quality. The quast results are shown below for all of the assemblies. Many of the reads are short and I feel like they could be collapsed down from some of the contigs from the other assemblies. Are there recommended methods for collapsing all of the assemblies into a consensus assembly? I stumbled across metassembler but haven't heard much about it. Any advice would greatly appreciated.

enter image description here

sequencing assembly • 602 views
ADD COMMENTlink modified 11 months ago by Damian Kao15k • written 11 months ago by O.rka120

Do you have the original fastqs for all the assemblies?

ADD REPLYlink written 11 months ago by Joe14k

Yes, I have the original fastq files which include R1, R2, and singletons.

ADD REPLYlink written 11 months ago by O.rka120

Yep, Damiens answer was where I was headed. Just concatenate all you R1s and all your R2s and assemble that.

Depending on the quality, you may not want to use all of them. If you have some which are lower quality, there’s no point ‘tainting’ your other reads with them.

88 lots of fastqs may also lead to too much depth, in which case you may need to downsample.

ADD REPLYlink written 11 months ago by Joe14k

Do you have 88 assemblies of the same organism?

ADD REPLYlink written 11 months ago by h.mon27k

Yes, I have 88 de-novo assemblies based on the same reference.

ADD REPLYlink written 11 months ago by O.rka120
1

I have several fasta files of assemblies from different samples.

It is implied, but it is not clear to me, that you have 88 sequencing samples, but these samples are all from the same reference. Could you clarify your experimental design?

I will list some possibilities from my incomplete understanding of your question:

Do you have one fastq dataset, which you assembled several times, with different methods and parameters, to get 88 assemblies?

Or do you have 88 fastq datasets, all from different isolates of the same organism, and assembled these 88 isolates separately?

Or (finally) do you have 88 fastq datasets, all from the same isolate, and assembled these 88 isolates separately?

ADD REPLYlink written 11 months ago by h.mon27k

Yes, I have 88 different metagenomic samples. I've mapped to a collection of taxa that I'm interested in (very small list) to get all species in the genus. I am assuming that there will be slight differences in community from each sample. I've tried coassembling these but as mentioned in another commenter, I had to subsample and that decreased the amount of data I had for assembling rare species. I have a feeling that there will be overlaps in contigs among different sample sets. Are there any tools that can look for this and extend contigs that have high overlap?

ADD REPLYlink written 11 months ago by O.rka120

Try some of these perhaps?

https://omictools.com/assembly-reconciliation-category

I can’t personally speak to any of them as I’ve never tried what you’re attempting.

ADD REPLYlink written 11 months ago by Joe14k

is there any particular reason why you constructed 88 assemblies? as Damian Kao says, why not just simply do one assembly with all data?

ADD REPLYlink written 11 months ago by lieven.sterck5.8k
1
gravatar for Damian Kao
11 months ago by
Damian Kao15k
USA
Damian Kao15k wrote:

You have the original fastqs for all 88 assemblies and they are all of the same organism. You can just perform a de novo assembly of all the raw fastqs instead of trying to merge the 88 assemblies.

ADD COMMENTlink written 11 months ago by Damian Kao15k

I kind of mentioned it in a comment above (after you posted this) but to do this I had to subsample and I lost a lot of information. I mainly want to group contigs from different assemblies that overlap and then extend them. I have a lot of small contigs and I wonder if they can be treated like reads?

ADD REPLYlink written 11 months ago by O.rka120
1

So you have 88 sets of metagenomic reads? I don't see why not just merge them all and treat them as one huge metagenomic read set and assemble that with metagenome aware assembler (metaSpades or something). And from the resulting assembly, pull out the contigs you want based on reference mapping.

ADD REPLYlink modified 11 months ago • written 11 months ago by Damian Kao15k

This doesn’t seem like a good idea to me because you’ll be introducing variants from different assemblies. You’ll be losing information in some areas and adding erroneous information in others.

ADD REPLYlink written 11 months ago by Joe14k

You're probably right. I'm trying to figure out the best way to handle all of these small contigs. I ran them through CheckM and there are definitely markers on them. It also makes the binning a lot more complicated but I can't throw them out.

ADD REPLYlink written 11 months ago by O.rka120

If you’re not looking to close the assemblies, and the contigs pass reasonable quality control, you can just leave them.

Alternatively, scaffold your contigs so that your small contigs are at least positioned within a final large contiguous assembly. You won’t gain more information, but you’ll make a contiguous sequence at least.

ADD REPLYlink written 11 months ago by Joe14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour