Extract fasta sequnces not matching a list
1
0
Entering edit mode
4.2 years ago

Hey guys,

I know it's possible to use seqtk subseq to extract fasta files matching a list. Now, is there a way to extract those fasta files that don't match the exact fasta headers on the list? Thanks!

Assembly sequence • 2.7k views
ADD COMMENT
2
Entering edit mode

Looks like you already have a solution. In general, this boils down to list manipulation. If you have a list of all sequences and a list of those that you don't want (those to exclude), grepping the two files using a -v option will give you their difference that can be used in normal fashion.

grep -v -f partial.lst full.lst > diff.lst
ADD REPLY
2
Entering edit mode
4.2 years ago
GenoMax 141k

You can use faSomeRecords from Jim Kent's utils for that.

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.
ADD COMMENT

Login before adding your answer.

Traffic: 2813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6