Question: How to remove circular contigs from assembly.fasta
gravatar for kristina.mahan
3 months ago by
kristina.mahan110 wrote:

I have a list of >100 circular contigs that I would like to remove from my de novo genome assembly.fasta. How can I remove these contigs from the assembly.fasta using a text file with the contig names/numbers? Or is there another way?

ADD COMMENTlink modified 3 months ago by Mark800 • written 3 months ago by kristina.mahan110
gravatar for Mark
3 months ago by
Mark800 wrote:

If you already know which contigs are circular, you can use the really cool seqkit tool. The grep subcommand is the perfect tool for this job.

seqkit grep assembly.fasta -n -v -f circular_contigs.txt > assembly_clean.fasta

-n specifies to match by full name not just by id pattern (this means the names need to match 100%)

-v inverts the search criteria (i.e. anything that's not circular)

-f specifies the file by which to look for patterns (in this case the circular contig header names)

circular_contigs.txt is a list (one header per line) that identifies the circular contigs to be removed

> assembly_clean.fasta seqkit outputs to the terminal (stdin) so this last bit is piping into a new file

More info here:

Hope that helps

ADD COMMENTlink written 3 months ago by Mark800
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1865 users visited in the last hour