Question: How to remove circular contigs from assembly.fasta
3 months ago by
kristina.mahan110 wrote:

I have a list of >100 circular contigs that I would like to remove from my de novo genome assembly.fasta. How can I remove these contigs from the assembly.fasta using a text file with the contig names/numbers? Or is there another way?

3 months ago by
Mark800 wrote:

If you already know which contigs are circular, you can use the really cool seqkit tool. The grep subcommand is the perfect tool for this job.

seqkit grep assembly.fasta -n -v -f circular_contigs.txt > assembly_clean.fasta

-n specifies to match by full name not just by id pattern (this means the names need to match 100%)

-v inverts the search criteria (i.e. anything that's not circular)

-f specifies the file by which to look for patterns (in this case the circular contig header names)

circular_contigs.txt is a list (one header per line) that identifies the circular contigs to be removed

> assembly_clean.fasta seqkit outputs to the terminal (stdin) so this last bit is piping into a new file

More info here:

Hope that helps

