Question: Extract fasta sequnces not matching a list
gravatar for genomes_and_MGEs
5 months ago by
genomes_and_MGEs0 wrote:

Hey guys,

I know it's possible to use seqtk subseq to extract fasta files matching a list. Now, is there a way to extract those fasta files that don't match the exact fasta headers on the list? Thanks!

sequence assembly • 167 views
ADD COMMENTlink modified 5 months ago by genomax87k • written 5 months ago by genomes_and_MGEs0

Looks like you already have a solution. In general, this boils down to list manipulation. If you have a list of all sequences and a list of those that you don't want (those to exclude), grepping the two files using a -v option will give you their difference that can be used in normal fashion.

grep -v -f partial.lst full.lst > diff.lst
ADD REPLYlink written 5 months ago by Mensur Dlakic6.0k
gravatar for genomax
5 months ago by
United States
genomax87k wrote:

You can use faSomeRecords from Jim Kent's utils for that.

faSomeRecords - Extract multiple fa records
   faSomeRecords in.fa listFile out.fa
   -exclude - output sequences not in the list file.
ADD COMMENTlink written 5 months ago by genomax87k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1500 users visited in the last hour