How to remove circular contigs from assembly.fasta
1
0
Entering edit mode
3.7 years ago

I have a list of >100 circular contigs that I would like to remove from my de novo genome assembly.fasta. How can I remove these contigs from the assembly.fasta using a text file with the contig names/numbers? Or is there another way?

de-novo-genome-assembly ngs • 972 views
ADD COMMENT
1
Entering edit mode
3.7 years ago
Mark ★ 1.5k

If you already know which contigs are circular, you can use the really cool seqkit tool. The grep subcommand is the perfect tool for this job.

seqkit grep assembly.fasta -n -v -f circular_contigs.txt > assembly_clean.fasta

-n specifies to match by full name not just by id pattern (this means the names need to match 100%)

-v inverts the search criteria (i.e. anything that's not circular)

-f specifies the file by which to look for patterns (in this case the circular contig header names)

circular_contigs.txt is a list (one header per line) that identifies the circular contigs to be removed

> assembly_clean.fasta seqkit outputs to the terminal (stdin) so this last bit is piping into a new file

More info here: https://bioinf.shenwei.me/seqkit/

Hope that helps

ADD COMMENT

Login before adding your answer.

Traffic: 2421 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6