Question: How to remove circular contigs from assembly.fasta
0
gravatar for kristina.mahan
3 months ago by
kristina.mahan110 wrote:

I have a list of >100 circular contigs that I would like to remove from my de novo genome assembly.fasta. How can I remove these contigs from the assembly.fasta using a text file with the contig names/numbers? Or is there another way?

ADD COMMENTlink modified 3 months ago by Mark800 • written 3 months ago by kristina.mahan110
1
gravatar for Mark
3 months ago by
Mark800
Mark800 wrote:

If you already know which contigs are circular, you can use the really cool seqkit tool. The grep subcommand is the perfect tool for this job.

seqkit grep assembly.fasta -n -v -f circular_contigs.txt > assembly_clean.fasta

-n specifies to match by full name not just by id pattern (this means the names need to match 100%)

-v inverts the search criteria (i.e. anything that's not circular)

-f specifies the file by which to look for patterns (in this case the circular contig header names)

circular_contigs.txt is a list (one header per line) that identifies the circular contigs to be removed

> assembly_clean.fasta seqkit outputs to the terminal (stdin) so this last bit is piping into a new file

More info here: https://bioinf.shenwei.me/seqkit/

Hope that helps

ADD COMMENTlink written 3 months ago by Mark800
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1865 users visited in the last hour