Question: What is the best way to merge multiple fasta files containing contigs?
gravatar for anastasia.gs17
22 months ago by
anastasia.gs170 wrote:


I have been working with some metagenomic samples and I have individually assembled them with Spades. Now I would like to merge the contigs.fasta files from my samples (I want to proceed with mapping and binning using anvi'o). Which is the most appropriate way of doing that?

fasta assembly anvi’o • 976 views
ADD COMMENTlink modified 22 months ago by zx87549.7k • written 22 months ago by anastasia.gs170


ADD REPLYlink written 22 months ago by ATpoint41k

But he might have identical names in different files. Try sed 's/>/>lname/' -i lname.fasta for each output file (lname here) to make each sequence name unique and then cat them

ADD REPLYlink written 22 months ago by Asaf8.4k

I think what OP is trying to ask for is merge-assemble the different fasta files. Is it that correct anastasia.gs17 ?

ADD REPLYlink written 22 months ago by lieven.sterck8.9k

The real problem of merging contigs from different assemblies into a single FASTA file is the likelihood of ending up with multiple contigs that may be matching to the homologous parts of identical/very closely related population genomes. In which case read recruitment with that kind of redundancy as reference will lead to dilution of short reads and will make it impossible to reliably reconstruct genomes later.

You can either try to reassemble these contigs to have a final non-redundant list of contigs for read recruitment, or do binning using individual samples and then collapse redundancy using a tool like dRep, or a start over with a co-assembly if your experimental design and/or your system permits that.

Best wishes, Meren.

PS: I got a username when I saw cat as an answer, but I will not be able to follow the discussion any further. If you have more specific questions regarding your system and best-practices for genome-resolved metagenomics feel free to try anvi'o slack (you can use the Slack button on anvi'o web page to get an invitation).

ADD REPLYlink written 22 months ago by a.murat.eren10

If you want to merge overlapping sequences into one, try a scaffolding software, like sspace or soap-denovo. If you only want them in the same fasta file, then cat, like @ATpoint and @Asaf just said

ADD REPLYlink written 22 months ago by b.bearmi10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1967 users visited in the last hour