I know it's possible to use seqtk subseq to extract fasta files matching a list. Now, is there a way to extract those fasta files that don't match the exact fasta headers on the list?
Looks like you already have a solution. In general, this boils down to list manipulation. If you have a list of all sequences and a list of those that you don't want (those to exclude), grepping the two files using a -v option will give you their difference that can be used in normal fashion.
grep -v -f partial.lst full.lst > diff.lst
You can use faSomeRecords from Jim Kent's utils for that.
faSomeRecords - Extract multiple fa records
faSomeRecords in.fa listFile out.fa
-exclude - output sequences not in the list file.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy