Question: Extract fasta sequnces not matching a list
0
gravatar for genomes_and_MGEs
5 months ago by
genomes_and_MGEs0 wrote:

Hey guys,

I know it's possible to use seqtk subseq to extract fasta files matching a list. Now, is there a way to extract those fasta files that don't match the exact fasta headers on the list? Thanks!

sequence assembly • 167 views
ADD COMMENTlink modified 5 months ago by genomax87k • written 5 months ago by genomes_and_MGEs0
2

Looks like you already have a solution. In general, this boils down to list manipulation. If you have a list of all sequences and a list of those that you don't want (those to exclude), grepping the two files using a -v option will give you their difference that can be used in normal fashion.

grep -v -f partial.lst full.lst > diff.lst
ADD REPLYlink written 5 months ago by Mensur Dlakic6.0k
2
gravatar for genomax
5 months ago by
genomax87k
United States
genomax87k wrote:

You can use faSomeRecords from Jim Kent's utils for that.

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.
ADD COMMENTlink written 5 months ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1500 users visited in the last hour